SQL Troubles: variance

Showing posts with label variance. Show all posts

24 December 2018

🔭Data Science: Variance (Just the Quotes)

"There is, then, in this analysis of variance no indication of any other than innate and heritable factors at work." (Sir Ronald A Fisher, "The Causes of Human Variability", Eugenics Review Vol. 10, 1918)

"The mean and variance are unambiguously determined by the distribution, but a distribution is, of course, not determined by its mean and variance: A number of different distributions have the same mean and the same variance." (Richard von Mises, "Probability, Statistics And Truth", 1928)

"However, perhaps the main point is that you are under no obligation to analyse variance into its parts if it does not come apart easily, and its unwillingness to do so naturally indicates that one’s line of approach is not very fruitful." (Sir Ronald A Fisher, [Letter to Lancelot Hogben] 1933)

"The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic." (Sir Ronald A Fisher, Journal of the Royal Statistical Society Vol. 1, 1934)

"Undoubtedly one of the most elegant, powerful, and useful techniques in modern statistical method is that of the Analysis of Variation and Co-variation by which the total variation in a set of data may be reduced to components associated with possible sources of variability whose relative importance we wish to assess. The precise form which any given analysis will take is intimately connected with the structure of the investigation from which the data are obtained. A simple structure will lead to a simple analysis; a complex structure to a complex analysis." (Michael J Moroney, "Facts from Figures", 1951)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Analysis of variance [...] stems from a hypothesis-testing formulation that is difficult to take seriously and would be of limited value for making final conclusions." (Herman Chernoff, Comment, The American Statistician 40(1), 1986)

"One of the classical assumptions in linear regression analysis is that of equal variance, which is frequently referred to as homoscedasticity. However, this assumption may not be valid in data analysis arising from many fields (e.g., economics, finance, engineering, and biological science). When heteroscedasticity (nonconstant variance) occurs, the statistical inferences and predictions via the ordinary least squares method are often not reliable. Therefore, it is crucial to study the heteroscedastic error structure in linear model fitting." (Xiaogang Su et al, "Treed Variance", Journal of Computational and Graphical Statistics, Vol. 15 (2), 2006)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves." (Bart Kosko, "Noise", 2006)

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"High-bias models typically produce simpler models that do not overfit and in those cases the danger is that of underfitting. Models with low-bias are typically more complex and that complexity enables us to represent the training data in a more accurate way. The danger here is that the flexibility provided by higher complexity may end up representing not only a relationship in the data but also the noise. Another way of portraying the bias-variance trade-off is in terms of complexity v simplicity." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"If either bias or variance is high, the model can be very far off from reality. In general, there is a trade-off between bias and variance. The goal of any machine-learning algorithm is to achieve low bias and low variance such that it gives good prediction performance. In reality, because of so many other hidden parameters in the model, it is hard to calculate the real bias and variance error. Nevertheless, the bias and variance provide a measure to understand the behavior of the machine-learning algorithm so that the model model can be adjusted to provide good prediction performance." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Variance is a prediction error due to different sets of training samples. Ideally, the error should not vary from one training sample to another sample, and the model should be stable enough to handle hidden variations between input and output variables. Normally this occurs with the overfitted model." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance quantifies how accurately a model estimates the target variable if a different dataset is used to train the model. It quantifies whether the mathematical formulation of our model is a good generalization of the underlying patterns. Specific overfitted rules based on specific scenarios and situations = high variance, and rules that are generalized and applicable to a variety of scenarios and situations = low variance." (Imran Ahmad, "40 Algorithms Every Programmer Should Know", 2020)

29 November 2018

🔭Data Science: Invariance (Just the Quotes)

"[…] as every law of nature implies the existence of an invariant, it follows that every law of nature is a constraint. […] Science looks for laws; it is therefore much concerned with looking for constraints. […] the world around us is extremely rich in constraints. We are so familiar with them that we take most of them for granted, and are often not even aware that they exist. […] A world without constraints would be totally chaotic." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"[...] the existence of any invariant over a set of phenomena implies a constraint, for its existence implies that the full range of variety does not occur. The general theory of invariants is thus a part of the theory of constraints. Further, as every law of nature implies the existence of an invariant, it follows that every law of nature is a constraint." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"Through all the meanings runs the basic idea of an 'invariant': that although the system is passing through a series of changes, there is some aspect that is unchanging; so some statement can be made that, in spite of the incessant changing, is true unchangingly." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"We know many laws of nature and we hope and expect to discover more. Nobody can foresee the next such law that will be discovered. Nevertheless, there is a structure in laws of nature which we call the laws of invariance. This structure is so far-reaching in some cases that laws of nature were guessed on the basis of the postulate that they fit into the invariance structure." (Eugene P Wigner, "The Role of Invariance Principles in Natural Philosophy", 1963)

"[..] principle of equipresence: A quantity present as an independent variable in one constitutive equation is so present in all, to the extent that its appearance is not forbidden by the general laws of Physics or rules of invariance. […] The principle of equipresence states, in effect, that no division of phenomena is to be laid down by constitutive equations." (Clifford Truesdell, "Six Lectures on Modern Natural Philosophy", 1966)

"It is now natural for us to try to derive the laws of nature and to test their validity by means of the laws of invariance, rather than to derive the laws of invariance from what we believe to be the laws of nature." (Eugene P Wigner, "Symmetries and Reflections", 1967)

"As a metaphor - and I stress that it is intended as a metaphor - the concept of an invariant that arises out of mutually or cyclically balancing changes may help us to approach the concept of self. In cybernetics this metaphor is implemented in the ‘closed loop’, the circular arrangement of feedback mechanisms that maintain a given value within certain limits. They work toward an invariant, but the invariant is achieved not by a steady resistance, the way a rock stands unmoved in the wind, but by compensation over time. Whenever we happen to look in a feedback loop, we find the present act pitted against the immediate past, but already on the way to being compensated itself by the immediate future. The invariant the system achieves can, therefore, never be found or frozen in a single element because, by its very nature, it consists in one or more relationships - and relationships are not in things but between them." (Ernst von Glasersfeld German, "Cybernetics, Experience and the Concept of Self", 1970)

"An essential condition for a theory of choice that claims normative status is the principle of invariance: different representations of the same choice problem should yield the same preference. That is, the preference between options should be independent of their description. Two characterizations that the decision maker, on reflection, would view as alternative descriptions of the same problem should lead to the same choice-even without the benefit of such reflection." (Amos Tversky & Daniel Kahneman, "Rational Choice and the Framing of Decisions", The Journal of Business Vol. 59 (4), 1986)

"Axiomatic theories of choice introduce preference as a primitive relation, which is interpreted through specific empirical procedures such as choice or pricing. Models of rational choice assume a principle of procedure invariance, which requires strategically equivalent methods of elicitation to yield the same preference order." (Amos Tversky et al, "The Causes of Preference Reversal", The American Economic Review Vol. 80 (1), 1990)

"Symmetry is basically a geometrical concept. Mathematically it can be defined as the invariance of geometrical patterns under certain operations. But when abstracted, the concept applies to all sorts of situations. It is one of the ways by which the human mind recognizes order in nature. In this sense symmetry need not be perfect to be meaningful. Even an approximate symmetry attracts one's attention, and makes one wonder if there is some deep reason behind it." (Eguchi Tohru & ‎K Nishijima ," Broken Symmetry: Selected Papers Of Y Nambu", 1995)

"How deep truths can be defined as invariants – things that do not change no matter what; how invariants are defined by symmetries, which in turn define which properties of nature are conserved, no matter what. These are the selfsame symmetries that appeal to the senses in art and music and natural forms like snowflakes and galaxies. The fundamental truths are based on symmetry, and there’s a deep kind of beauty in that." (K C Cole, "The Universe and the Teacup: The Mathematics of Truth and Beauty", 1997)

"Cybernetics is the science of effective organization, of control and communication in animals and machines. It is the art of steersmanship, of regulation and stability. The concern here is with function, not construction, in providing regular and reproducible behaviour in the presence of disturbances. Here the emphasis is on families of solutions, ways of arranging matters that can apply to all forms of systems, whatever the material or design employed. [...] This science concerns the effects of inputs on outputs, but in the sense that the output state is desired to be constant or predictable – we wish the system to maintain an equilibrium state. It is applicable mostly to complex systems and to coupled systems, and uses the concepts of feedback and transformations (mappings from input to output) to effect the desired invariance or stability in the result." (Chris Lucas, "Cybernetics and Stochastic Systems", 1999)

"Each of the most basic physical laws that we know corresponds to some invariance, which in turn is equivalent to a collection of changes which form a symmetry group. […] whilst leaving some underlying theme unchanged. […] for example, the conservation of energy is equivalent to the invariance of the laws of motion with respect to translations backwards or forwards in time […] the conservation of linear momentum is equivalent to the invariance of the laws of motion with respect to the position of your laboratory in space, and the conservation of angular momentum to an invariance with respect to directional orientation… discovery of conservation laws indicated that Nature possessed built-in sustaining principles which prevented the world from just ceasing to be." (John D Barrow, "New Theories of Everything", 2007)

"The concept of symmetry (invariance) with its rigorous mathematical formulation and generalization has guided us to know the most fundamental of physical laws. Symmetry as a concept has helped mankind not only to define ‘beauty’ but also to express the ‘truth’. Physical laws tries to quantify the truth that appears to be ‘transient’ at the level of phenomena but symmetry promotes that truth to the level of ‘eternity’." (Vladimir G Ivancevic & Tijana T Ivancevic,"Quantum Leap", 2008)

"The concept of symmetry is used widely in physics. If the laws that determine relations between physical magnitudes and a change of these magnitudes in the course of time do not vary at the definite operations (transformations), they say, that these laws have symmetry (or they are invariant) with respect to the given transformations. For example, the law of gravitation is valid for any points of space, that is, this law is in variant with respect to the system of coordinates." (Alexey Stakhov et al, "The Mathematics of Harmony", 2009)

"In dynamical systems, a bifurcation occurs when a small smooth change made to the parameter values (the bifurcation parameters) of a system causes a sudden 'qualitative' or topological change in its behaviour. Generally, at a bifurcation, the local stability properties of equilibria, periodic orbits or other invariant sets changes." (Gregory Faye, "An introduction to bifurcation theory", 2011)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)

More quotes on "Invariance" at the-web-of-knowledge.blogspot.com.

06 May 2018

🔬Data Science: Variance (Definitions)

"The mean squared deviation of the measured response values from their average value." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"The variance reflects the amount of variation in a set of observations." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Describes dispersion about the data set’s mean. The variance is the square of the standard deviation. Conversely, the standard deviation is the square root of the variance." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)

"Summary statistic that indicates the degree of variability among participants for a given variable. The variance is essentially the average squared deviation from the mean and is the square of the standard deviation." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A statistical measure of how spread (or varying) the values of a variable are around a central value such as the mean." (Jonathan Ferrar et al, "The Power of People", 2017)

SQL Troubles

Pages