30 November 2018

Data Science: Control (Just the Quotes)

"An inference, if it is to have scientific value, must constitute a prediction concerning future data. If the inference is to be made purely with the help of the distribution theory of statistics, the experiments that constitute evidence for the inference must arise from a state of statistical control; until that state is reached, there is no universe, normal or otherwise, and the statistician’s calculations by themselves are an illusion if not a delusion. The fact is that when distribution theory is not applicable for lack of control, any inference, statistical or otherwise, is little better than a conjecture. The state of statistical control is therefore the goal of all experimentation. (William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability." (William E Deming, "Some Theory of Sampling", 1950)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"In complex systems cause and effect are often not closely related in either time or space. The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by nonlinear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive-feedback loops describing growth processes as well as negative, goal-seeking loops. In the complex system the cause of a difficulty may lie far back in time from the symptoms, or in a completely different and remote part of the system. In fact, causes are usually found, not in prior events, but in the structure and policies of the system." (Jay W Forrester, "Urban dynamics", 1969)

"To adapt to a changing environment, the system needs a variety of stable states that is large enough to react to all perturbations but not so large as to make its evolution uncontrollably chaotic. The most adequate states are selected according to their fitness, either directly by the environment, or by subsystems that have adapted to the environment at an earlier stage. Formally, the basic mechanism underlying self-organization is the (often noise-driven) variation which explores different regions in the system’s state space until it enters an attractor. This precludes further variation outside the attractor, and thus restricts the freedom of the system’s components to behave independently. This is equivalent to the increase of coherence, or decrease of statistical entropy, that defines self-organization." (Francis Heylighen, "The Science Of Self-Organization And Adaptivity", 1970)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"Thus, the construction of a mathematical model consisting of certain basic equations of a process is not yet sufficient for effecting optimal control. The mathematical model must also provide for the effects of random factors, the ability to react to unforeseen variations and ensure good control despite errors and inaccuracies." (Yakov Khurgin, "Did You Say Mathematics?", 1974)

"Uncontrolled variation is the enemy of quality." (W Edwards Deming, 1980)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996)

"A mathematical model uses mathematical symbols to describe and explain the represented system. Normally used to predict and control, these models provide a high degree of abstraction but also of precision in their application." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"A model is an imitation of reality and a mathematical model is a particular form of representation. We should never forget this and get so distracted by the model that we forget the real application which is driving the modelling. In the process of model building we are translating our real world problem into an equivalent mathematical problem which we solve and then attempt to interpret. We do this to gain insight into the original real world situation or to use the model for control, optimization or possibly safety studies." (Ian T Cameron & Katalin Hangos, "Process Modelling and Model Analysis", 2001)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"The methodology of feedback design is borrowed from cybernetics (control theory). It is based upon methods of controlled system model’s building, methods of system states and parameters estimation (identification), and methods of feedback synthesis. The models of controlled system used in cybernetics differ from conventional models of physics and mechanics in that they have explicitly specified inputs and outputs. Unlike conventional physics results, often formulated as conservation laws, the results of cybernetical physics are formulated in the form of transformation laws, establishing the possibilities and limits of changing properties of a physical system by means of control." (Alexander L Fradkov, "Cybernetical Physics: From Control of Chaos to Quantum Control", 2007)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"One technique employing correlational analysis is multiple regression analysis (MRA), in which a number of independent variables are correlated simultaneously (or sometimes sequentially, but we won’t talk about that variant of MRA) with some dependent variable. The predictor variable of interest is examined along with other independent variables that are referred to as control variables. The goal is to show that variable A influences variable B 'net of' the effects of all the other variables. That is to say, the relationship holds even when the effects of the control variables on the dependent variable are taken into account." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The theory behind multiple regression analysis is that if you control for everything that is related to the independent variable and the dependent variable by pulling their correlations out of the mix, you can get at the true causal relation between the predictor variable and the outcome variable. That’s the theory. In practice, many things prevent this ideal case from being the norm." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control." (William E Deming)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.