16 December 2018

Data Science: Correlation (Just the Quotes)

"Reflection soon made it clear to me that not only were the two new problems identical in principle with the old one of kinship which I had already solved, but that all three of them were no more than special cases of a much more general problem - namely, that of Correlation." (Francis Galton,"Kinship and Correlation", 1890) 

"It had appeared from observation, and it was fully confirmed by this theory, that such a thing existed as an 'Index of Correlation', that is to say, a fraction, now commonly written T, that connects with close approximation every value of the deviation on the part of the subject, with the average of all the associated deviations of the Relative [...]" (Francis Galton, "Memories of My Life", 1908)

"One of the main duties of science is the correlation of phenomena, apparently disconnected and even contradictory." (Frederick Soddy, "The Interpretation of Radium and the Structure of the Atom", 1909)

"To speak of the cause of an event is therefore misleading. Any set of antecedents from which the event can theoretically be inferred by means of correlations might be called a cause of the event. But to speak of the cause is to imply a uniqueness [...]." (Bertrand Russell, "Mysticism and Logic: And Other Essays", 1910)

"'Correlation' is a term used to express the relation which exists between two series or groups of data where there is a causal connection. In order to have correlation it is not enough that the two sets of data should both increase or decrease simultaneously. For correlation it is necessary that one set of facts should have some definite causal dependence upon the other set [...]" (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"'Causation' has been popularly used to express the condition of association, when applied to natural phenomena. There is no philosophical basis for giving it a wider meaning than partial or absolute association. In no case has it been proved that there is an inherent necessity in the laws of nature. Causation is correlation. [...] perfect correlation, when based upon sufficient experience, is causation in the scientific sense." (Henry E. Niles, "Correlation, Causation and Wright's Theory of 'Path Coefficients'", Genetics, 1922)

"The futile elaboration of innumerable measures of correlation, and the evasion of the real difficulties of sampling problems under cover of a contempt for small samples, were obviously beginning to make its pretensions ridiculous. These procedures were not only ill-aimed, but for all their elaboration, not sufficiently accurate." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937)

"Graphic methods are very commonly used in business correlation problems. On the whole, carefully handled and skillfully interpreted graphs have certain advantages over mathematical methods of determining correlation in the usual business problems. The elements of judgment and special knowledge of conditions can be more easily introduced in studying correlation graphically. Mathematical correlation is often much too rigid for the data at hand." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"[…] statistical literacy. That is, the ability to read diagrams and maps; a 'consumer' understanding of common statistical terms, as average, percent, dispersion, correlation, and index number."  (Douglas Scates, "Statistics: The Mathematics for Social Problems", 1943)

"Another thing to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated." (Darell Huff, "How to Lie with Statistics", 1954)

"Keep in mind that a correlation may be real and based on real cause and effect, and still be almost worthless in determining action in any single case." (Darell Huff, "How to Lie with Statistics", 1954)

"When you find somebody - usually an interested party - making a fuss about a correlation, look first of all to see if it is not one of this type, produced by the stream of events, the trend of the times." (Darell Huff, "How to Lie with Statistics", 1954)

"There is no correlation between the cause and the effect. The events reveal only an aleatory determination, connected not so much with the imperfection of our knowledge as with the structure of the human world." (Raymond Aron, "The Opium of the Intellectuals", 1955)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"It has been said that data collection is like garbage collection: before you collect it you should have in mind what you are going to do with it." (Russell Fox & Max Gorbuny, "The Science of Science: Methods of Interpreting Physical Phenomena", 1964)

"Today we preach that science is not science unless it is quantitative. We substitute correlation for causal studies, and physical equations for organic reasoning. Measurements and equations are supposed to sharpen thinking, but [...] they more often tend to make the thinking non-causal and fuzzy." (John R Platt, "Strong Inference", Science Vol. 146 (3641), 1964)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

"The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning." (Stephen J Gould, "The Mismeasure of Man", 1980)

"Correlation analysis is a useful tool for uncovering a tenuous relationship, but it doesn't necessarily provide any real understanding of the relationship, and it certainly doesn't provide any evidence that the relationship is one of cause and effect. People who don't understand correlation tend to credit it with being a more fundamental approach than it is." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"Only a 0 correlation is uninteresting, and in practice 0 correlations do not occur. When you stuff a bunch of numbers into the correlation formula, the chance of getting exactly 0, even if no correlation is truly present, is about the same as the chance of a tossed coin ending up on edge instead of heads or tails.(Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"Correlation and causation are two quite different words, and the innumerate are more prone to mistake them than most." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"Nature normally hates power laws. In ordinary systems all quantities follow bell curves, and correlations decay rapidly, obeying exponential laws. But all that changes if the system is forced to undergo a phase transition. Then power laws emerge-nature's unmistakable sign that chaos is departing in favor of order. The theory of phase transitions told us loud and clear that the road from disorder to order is maintained by the powerful forces of self-organization and is paved by power laws. It told us that power laws are not just another way of characterizing a system's behavior. They are the patent signatures of self-organization in complex systems." (Albert-László Barabási, "Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life", 2002)

"If you flip a coin three times and it lands on heads each time, it's probably chance. If you flip it a hundred times and it lands on heads each time, you can be pretty sure the coin has heads on both sides. That's the concept behind statistical significance - it's the odds that the correlation (or other finding) is real, that it isn't just random chance." (T Colin Campbell, "The China Study", 2004)

"Nonetheless, the basic principles regarding correlations between variables are not that difficult to understand. We must look for patterns that reveal potential relationships and for evidence that variables are actually related. But when we do spot those relationships, we should not jump to conclusions about causality. Instead, we need to weigh the strength of the relationship and the plausibility of our theory, and we must always try to discount the possibility of spuriousness." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Correlation analysis can help us find the size of the formal relation between two properties. An equidirectional variation is present if we observe high values of one variable together with high values of the other variable (or low ones combined with low ones). In this case there is a positive correlation. If high values are combined with low values and low values with high values, the variation is counterdirectional, and the correlation is negative." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is important that uncertainty components that are independent of each other are added quadratically. This is also true for correlated uncertainty components, provided they are independent of each other, i.e., as long as there is no correlation between the components." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"The fact that the same uncertainty (e.g., scale uncertainty) is uncorrelated if we are dealing with only one measurement, but correlated (i.e., systematic) if we look at more than one measurement using the same instrument shows that both types of uncertainties are of the same nature. Of course, an uncertainty keeps its characteristics (e.g., Poisson distributed), independent of the fact whether it occurs only once or more often." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[...] if you want to show change through time, use a time-series chart; if you need to compare, use a bar chart; or to display correlation, use a scatter-plot - because some of these rules make good common sense." (Alberto Cairo, "The Functional Art", 2011)

"Economists should study financial markets as they actually operate, not as they assume them to operate - observing the way in which information is actually processed, observing the serial correlations, bonanzas, and sudden stops, not assuming these away as noise around the edges of efficient and rational markets." (Adair Turner, "Economics after the Crisis: Objectives and means", 2012)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"The problem of complexity is at the heart of mankind’s inability to predict future events with any accuracy. Complexity science has demonstrated that the more factors found within a complex system, the more chances of unpredictable behavior. And without predictability, any meaningful control is nearly impossible. Obviously, this means that you cannot control what you cannot predict. The ability ever to predict long-term events is a pipedream. Mankind has little to do with changing climate; complexity does." (Lawrence K Samuels, "The Real Science Behind Changing Climate", LewRockwell.com, August 1, 2014)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The theory behind multiple regression analysis is that if you control for everything that is related to the independent variable and the dependent variable by pulling their correlations out of the mix, you can get at the true causal relation between the predictor variable and the outcome variable. That’s the theory. In practice, many things prevent this ideal case from being the norm." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"A correlation is simply a bivariate relationship - a fancy way of saying that there is a relationship between two ('bi') variables ('variate'). And a bivariate relationship doesn’t prove that one thing caused the other. Think of it this way: you can observe that two things appear to be related statistically, but that doesn’t tell you the answer to any of the questions you might really care about - why is there a relationship and what does it mean to us as a consumer of data?" (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Confirmation bias can affect nearly every aspect of the way you look at data, from sampling and observation to forecasting - so it’s something  to keep in mind anytime you’re interpreting data. When it comes to correlation versus causation, confirmation bias is one reason that some people ignore omitted variables - because they’re making the jump from correlation to causation based on preconceptions, not the actual evidence." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"In the real world, statistical issues rarely exist in isolation. You’re going to come across cases where there’s more than one problem with the data. For example, just because you identify some sampling errors doesn’t mean there aren’t also issues with cherry picking and correlations and averages and forecasts - or simply more sampling issues, for that matter. Some cases may have no statistical issues, some may have dozens. But you need to keep your eyes open in order to spot them all." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Correlation is not equivalent to cause for one major reason. Correlation is well defined in terms of a mathematical formula. Cause is not well defined." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The degree to which one variable can be predicted from another can be calculated as the correlation between them. The square of the correlation (R^2) is the proportion of the variance of one that can be 'explained' by knowledge of the other." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"It is convenient to use a single number to summarize a steadily increasing or decreasing relationship between the pairs of numbers shown on a scatter-plot. This is generally chosen to be the Pearson correlation coefficient [...]. A Pearson correlation runs between −1 and 1, and expresses how close to a straight line the dots or data-points fall. A correlation of 1 occurs if all the points lie on a straight line going upwards, while a correlation of −1 occurs if all the points lie on a straight line going downwards. A correlation near 0 can come from a random scatter of points, or any other pattern in which there is no systematic trend upwards or downwards [...]." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Correlation doesn't imply causation - but apparently it doesn't sell newspapers either."(Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Correlation quantifies the relationship between features. The purpose of correlation analysis is to understand the dependencies between features, so that observed effects can be explained or desired effects can be achieved." (Thomas A Runkler, "Data Analytics: Models and Algorithms for Intelligent Data Analysis" 3rd Ed., 2020)

"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"The practice of finding relationships between different sets of data - also known as correlations - is the bread and butter of what data analysis, and by proxy data visualization, is all about." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

More quotes on "Correlation" at the-web-of-knowledge.blogspot.com

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.