22 November 2018

Data Science: Regression toward the Mean (Just the Quotes)

"Whenever we make any decision based on the expectation that matters will return to 'normal', we are employing the notion of regression to the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Regression to the mean occurs when the process produces results that are statistically independent or negatively correlated. With strong negative serial correlation, extremes are likely to be reversed each time (which would reinforce the instructors' error). In contrast, with strong positive dependence, extreme results are quite likely to be clustered together." (Dan Trietsch, "Statistical Quality Control : A loss minimization approach", 1998) 

"Unfortunately, people are poor intuitive scientists, generally failing to reason in accordance with the principles of scientific method. For example, people do not generate sufficient alternative explanations or consider enough rival hypotheses. People generally do not adequately control for confounding variables when they explore a novel environment. People’s judgments are strongly affected by the frame in which the information is presented, even when the objective information is unchanged. People suffer from overconfidence in their judgments (underestimating uncertainty), wishful thinking (assessing desired outcomes as more likely than undesired outcomes), and the illusion of control (believing one can predict or influence the outcome of random events). People violate basic rules of probability, do not understand basic statistical concepts such as regression to the mean, and do not update beliefs according to Bayes’ rule. Memory is distorted by hindsight, the availability and salience of examples, and the desirability of outcomes. And so on."  (John D Sterman, "Business Dynamics: Systems thinking and modeling for a complex world", 2000)

 "People often attribute meaning to phenomena governed only by a regression to the mean, the mathematical tendency for an extreme value of an at least partially chance-dependent quantity to be followed by a value closer to the average. Sports and business are certainly chancy enterprises and thus subject to regression. So is genetics to an extent, and so very tall parents can be expected to have offspring who are tall, but probably not as tall as they are. A similar tendency holds for the children of very short parents." (John A Paulos, "A Mathematician Plays the Stock Market", 2003)

"'Regression to the mean' […] says that, in any series of events where chance is involved, very good or bad performances, high or low scores, extreme events, etc. tend on the average, to be followed by more average performance or less extreme events. If we do extremely well, we're likely to do worse the next time, while if we do poorly, we're likely to do better the next time. But regression to the mean is not a natural law. Merely a statistical tendency. And it may take a long time before it happens." (Peter Bevelin, "Seeking Wisdom: From Darwin to Munger",  2003)

"Another aspect of representativeness that is misunderstood or ignored is the tendency of regression to the mean. Stochastic phenomena where the outcomes vary randomly around stable values (so-called stationary processes) exhibit the general tendency that extreme outcomes are more likely to be followed by an outcome closer to the mean or mode than by other extreme values in the same direction. For example, even a bright student will observe that her or his performance in a test following an especially outstanding outcome tends to be less brilliant. Similarly, extremely low or extremely high sales in a given period tend to be followed by sales that are closer to the stable mean or the stable trend." (Hans G Daellenbach & Donald C McNickle, "Management Science: Decision making through systems thinking", 2005)

"Behavioural research shows that we tend to use simplifying heuristics when making judgements about uncertain events. These are prone to biases and systematic errors, such as stereotyping, disregard of sample size, disregard for regression to the mean, deriving estimates based on the ease of retrieving instances of the event, anchoring to the initial frame, the gambler’s fallacy, and wishful thinking, which are all affected by our inability to consider more than a few aspects or dimensions of any phenomenon or situation at the same time." (Hans G Daellenbach & Donald C McNickle, "Management Science: Decision making through systems thinking", 2005)

"Concluding that the population is becoming more centralized by observing behavior at the extremes is called the 'Regression to the Mean' Fallacy. […] When looking for a change in a population, do not look only at the extremes; there you will always find a motion to the mean. Look at the entire population." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"'Regression to the mean' describes a natural phenomenon whereby, after a short period of success, things tend to return to normal immediately afterwards. This notion applies particularly to random events." (Alan Graham, "Developing Thinking in Statistics", 2006)

"regression to the mean: The fact that unexpectedly high or low numbers from the mean are an exception and are usually followed by numbers that are closer to the mean. Over the long haul, we tend to get relatively more numbers that are near the mean compared to numbers that are far from the mean." (Hari Singh, "Framed! Solve an Intriguing Mystery and Master How to Make Smart Choices", 2006)

 "A naive interpretation of regression to the mean is that heights, or baseball records, or other variable phenomena necessarily become more and more 'average' over time. This view is mistaken because it ignores the error in the regression predicting y from x. For any data point xi, the point prediction for its yi will be regressed toward the mean, but the actual yi that is observed will not be exactly where it is predicted. Some points end up falling closer to the mean and some fall further." (Andrew Gelman & Jennifer Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models", 2007)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Regression does not describe changes in ability that happen as time passes […]. Regression is caused by performances fluctuating about ability, so that performances far from the mean reflect abilities that are closer to the mean." (Gary Smith, "Standard Deviations", 2014)

"We encounter regression in many contexts - pretty much whenever we see an imperfect measure of what we are trying to measure. Standardized tests are obviously an imperfect measure of ability. [...] Each experimental score is an imperfect measure of “ability,” the benefits from the layout. To the extent there is randomness in this experiment - and there surely is - the prospective benefits from the layout that has the highest score are probably closer to the mean than was the score." (Gary Smith, "Standard Deviations", 2014)

"When a trait, such as academic or athletic ability, is measured imperfectly, the observed differences in performance exaggerate the actual differences in ability. Those who perform the best are probably not as far above average as they seem. Nor are those who perform the worst as far below average as they seem. Their subsequent performances will consequently regress to the mean." (Gary Smith, "Standard Deviations", 2014)

"The term shrinkage is used in regression modeling to denote two ideas. The first meaning relates to the slope of a calibration plot, which is a plot of observed responses against predicted responses. When a dataset is used to fit the model parameters as well as to obtain the calibration plot, the usual estimation process will force the slope of observed versus predicted values to be one. When, however, parameter estimates are derived from one dataset and then applied to predict outcomes on an independent dataset, overfitting will cause the slope of the calibration plot (i.e., the shrinkage factor ) to be less than one, a result of regression to the mean. Typically, low predictions will be too low and high predictions too high. Predictions near the mean predicted value will usually be quite accurate. The second meaning of shrinkage is a statistical estimation method that preshrinks regression coefficients towards zero so that the calibration plot for new data will not need shrinkage as its calibration slope will be one." (Frank E. Harrell Jr., "Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis" 2nd Ed, 2015)

"Often when people relate essentially the same variable in two different groups, or at two different times, they see this same phenomenon - the tendency of the response variable to be closer to the mean than the predicted value. Unfortunately, people try to interpret this by thinking that the performance of those far from the mean is deteriorating, but it’s just a mathematical fact about the correlation. So, today we try to be less judgmental about this phenomenon and we call it regression to the mean. We managed to get rid of the term 'mediocrity', but the name regression stuck as a name for the whole least squares fitting procedure - and that’s where we get the term regression line." (Richard D De Veaux et al, "Stats: Data and Models", 2016)

"Regression toward the mean is pervasive. In sports, excellent performance tends to be followed by good, but less outstanding, performance. [...] By contrast, the good news about regression toward the mean is that very poor performance tends to be followed by improved performance. If you got the worst score in your statistics class on the first exam, you probably did not do so poorly on the second exam (but you were probably still below the mean)." (Alan Agresti et al, Statistics: The Art and Science of Learning from Data" 4th Ed., 2018)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.