25 December 2018

Data Science: Variation (Just the Quotes)

"There is a maxim which is often quoted, that ‘The same causes will always produce the same effects.’ To make this maxim intelligible we must define what we mean by the same causes and the same effects, since it is manifest that no event ever happens more that once, so that the causes and effects cannot be the same in all respects. [...] There is another maxim which must not be confounded with that quoted at the beginning of this article, which asserts ‘That like causes produce like effects’. This is only true when small variations in the initial circumstances produce only small variations in the final state of the system. In a great many physical phenomena this condition is satisfied; but there are other cases in which a small initial variation may produce a great change in the final state of the system, as when the displacement of the ‘points’ causes a railway train to run into another instead of keeping its proper course." (James C Maxwell, "Matter and Motion", 1876)

"Statistics may be regarded as (i) the study of populations, (ii) as the study of variation, and (iii) as the study of methods of the reduction of data." (Sir Ronald A Fisher, "Statistical Methods for Research Worker", 1925)

"The conception of statistics as the study of variation is the natural outcome of viewing the subject as the study of populations; for a population of individuals in all respects identical is completely described by a description of anyone individual, together with the number in the group. The populations which are the object of statistical study always display variations in one or more respects. To speak of statistics as the study of variation also serves to emphasise the contrast between the aims of modern statisticians and those of their predecessors." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"Postulate 1. All chance systems of causes are not alike in the sense that they enable us to predict the future in terms of the past. Postulate 2. Constant systems of chance causes do exist in nature. Postulate 3. Assignable causes of variation may be found and eliminated."(Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"In our definition of system we noted that all systems have interrelationships between objects and between their attributes. If every part of the system is so related to every other part that any change in one aspect results in dynamic changes in all other parts of the total system, the system is said to behave as a whole or coherently. At the other extreme is a set of parts that are completely unrelated: that is, a change in each part depends only on that part alone. The variation in the set is the physical sum of the variations of the parts. Such behavior is called independent or physical summativity." (Arthur D Hall & Robert E Fagen, "Definition of System", General Systems Vol. 1, 1956)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"Adaptive system - whether on the biological, psychological, or sociocultural level - must manifest (1) some degree of 'plasticity' and 'irritability' vis-a-vis its environment such that it carries on a constant interchange with acting on and reacting to it; (2) some source or mechanism for variety, to act as a potential pool of adaptive variability to meet the problem of mapping new or more detailed variety and constraints in a changeable environment; (3) a set of selective criteria or mechanisms against which the 'variety pool' may be sifted into those variations in the organization or system that more closely map the environment and those that do not; and (4) an arrangement for preserving and/or propagating these 'successful' mappings." (Walter F Buckley," Sociology and modern systems theory", 1967)

"To adapt to a changing environment, the system needs a variety of stable states that is large enough to react to all perturbations but not so large as to make its evolution uncontrollably chaotic. The most adequate states are selected according to their fitness, either directly by the environment, or by subsystems that have adapted to the environment at an earlier stage. Formally, the basic mechanism underlying self-organization is the (often noise-driven) variation which explores different regions in the system’s state space until it enters an attractor. This precludes further variation outside the attractor, and thus restricts the freedom of the system’s components to behave independently. This is equivalent to the increase of coherence, or decrease of statistical entropy, that defines self-organization." (Francis Heylighen, "The Science Of Self-Organization And Adaptivity", 1970)

"Thus, the construction of a mathematical model consisting of certain basic equations of a process is not yet sufficient for effecting optimal control. The mathematical model must also provide for the effects of random factors, the ability to react to unforeseen variations and ensure good control despite errors and inaccuracies." (Yakov Khurgin, "Did You Say Mathematics?", 1974)

"The greater the uncertainty, the greater the amount of decision making and information processing. It is hypothesized that organizations have limited capacities to process information and adopt different organizing modes to deal with task uncertainty. Therefore, variations in organizing modes are actually variations in the capacity of organizations to process information and make decisions about events which cannot be anticipated in advance." (John K Galbraith, "Organization Design", 1977)

"Uncontrolled variation is the enemy of quality." (W Edwards Deming, 1980)

"A good description of the data summarizes the systematic variation and leaves residuals that look structureless. That is, the residuals exhibit no patterns and have no exceptionally large values, or outliers. Any structure present in the residuals indicates an inadequate fit. Looking at the residuals laid out in an overlay helps to spot patterns and outliers and to associate them with their source in the data." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"A useful description relates the systematic variation to one or more factors; if the residuals dwarf the effects for a factor, we may not be able to relate variation in the data to changes in the factor. Furthermore, changes in the factor may bring no important change in the response. Such comparisons of residuals and effects require a measure of the variation of overlays relative to each other." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"The term chaos is used in a specific sense where it is an inherently random pattern of behaviour generated by fixed inputs into deterministic (that is fixed) rules (relationships). The rules take the form of non-linear feedback loops. Although the specific path followed by the behaviour so generated is random and hence unpredictable in the long-term, it always has an underlying pattern to it, a 'hidden' pattern, a global pattern or rhythm. That pattern is self-similarity, that is a constant degree of variation, consistent variability, regular irregularity, or more precisely, a constant fractal dimension. Chaos is therefore order (a pattern) within disorder (random behaviour)." (Ralph D Stacey, "The Chaos Frontier: Creative Strategic Control for Business", 1991)

"Fitting data means finding mathematical descriptions of structure in the data. An additive shift is a structural property of univariate data in which distributions differ only in location and not in spread or shape. […] The process of identifying a structure in data and then fitting the structure to produce residuals that have the same distribution lies at the heart of statistical analysis. Such homogeneous residuals can be pooled, which increases the power of the description of the variation in the data." (William S Cleveland, "Visualizing Data", 1993)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"Swarm systems generate novelty for three reasons: (1) They are 'sensitive to initial conditions' - a scientific shorthand for saying that the size of the effect is not proportional to the size of the cause - so they can make a surprising mountain out of a molehill. (2) They hide countless novel possibilities in the exponential combinations of many interlinked individuals. (3) They don’t reckon individuals, so therefore individual variation and imperfection can be allowed. In swarm systems with heritability, individual variation and imperfection will lead to perpetual novelty, or what we call evolution." (Kevin Kelly, "Out of Control: The New Biology of Machines, Social Systems and the Economic World", 1995)

"Unlike classical mathematics, net math exhibits nonintuitive traits. In general, small variations in input in an interacting swarm can produce huge variations in output. Effects are disproportional to causes - the butterfly effect." (Kevin Kelly, "Out of Control: The New Biology of Machines, Social Systems and the Economic World", 1995)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Statistics is a general intellectual method that applies wherever data, variation, and chance appear. It is a fundamental method because data, variation and chance are omnipresent in modern life. It is an independent discipline with its own core ideas rather than, for example, a branch of mathematics. […] Statistics offers general, fundamental, and independent ways of thinking." (David S Moore, "Statistics among the Liberal Arts", Journal of the American Statistical Association, 1998)

"No comparison between two values can be global. A simple comparison between the current figure and some previous value and convey the behavior of any time series. […] While it is simple and easy to compare one number with another number, such comparisons are limited and weak. They are limited because of the amount of data used, and they are weak because both of the numbers are subject to the variation that is inevitably present in weak world data. Since both the current value and the earlier value are subject to this variation, it will always be difficult to determine just how much of the difference between the values is due to variation in the numbers, and how much, if any, of the difference is due to real changes in the process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"When a process displays unpredictable behavior, you can most easily improve the process and process outcomes by identifying the assignable causes of unpredictable variation and removing their effects from your process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Statistics is the analysis of variation. There are many sources and kinds of variation. In environmental studies it is particularly important to understand the kinds of variation and the implications of the difference. Two important categories are variability and uncertainty. Variability refers to variation in environmental quantities (which may have special regulatory interest), uncertainty refers to the degree of precision with which these quantities are estimated." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"When there is more than one source of variation it is important to identify those sources." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Thus, nonlinearity can be understood as the effect of a causal loop, where effects or outputs are fed back into the causes or inputs of the process. Complex systems are characterized by networks of such causal loops. In a complex, the interdependencies are such that a component A will affect a component B, but B will in general also affect A, directly or indirectly.  A single feedback loop can be positive or negative. A positive feedback will amplify any variation in A, making it grow exponentially. The result is that the tiniest, microscopic difference between initial states can grow into macroscopically observable distinctions." (Carlos Gershenson, "Design and Control of Self-organizing Systems", 2007)

"All forms of complex causation, and especially nonlinear transformations, admittedly stack the deck against prediction. Linear describes an outcome produced by one or more variables where the effect is additive. Any other interaction is nonlinear. This would include outcomes that involve step functions or phase transitions. The hard sciences routinely describe nonlinear phenomena. Making predictions about them becomes increasingly problematic when multiple variables are involved that have complex interactions. Some simple nonlinear systems can quickly become unpredictable when small variations in their inputs are introduced." (Richard N Lebow, "Forbidden Fruit: Counterfactuals and International Relations", 2010)

"Statistics is the scientific discipline that provides methods to help us make sense of data. […] The field of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation." (Roxy Peck & Jay L Devore, "Statistics: The Exploration and Analysis of Data" 7th Ed, 2012)

"A signal is a useful message that resides in data. Data that isn’t useful is noise. […] When data is expressed visually, noise can exist not only as data that doesn’t inform but also as meaningless non-data elements of the display (e.g. irrelevant attributes, such as a third dimension of depth in bars, color variation that has no significance, and artificial light and shadow effects)." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"One of the most common problems that you will encounter when training deep neural networks will be overfitting. What can happen is that your network may, owing to its flexibility, learn patterns that are due to noise, errors, or simply wrong data. [...] The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e., the noise) as if that variation represented the underlying model structure. The opposite is called underfitting - when the model cannot capture the structure of the data." (Umberto Michelucci, "Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks", 2018)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.