SQL Troubles

03 November 2018

🔭Data Science: Tails (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney,Facts from Figures", 1951)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte,Data Analysis for Politics and Policy", 1974)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails" (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al,Graphical Methods for Data Analysis", 1983)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin,Common Errors in Statistics" (and How to Avoid Them)", 2003)

"Bell curves don't differ that much in their bells. They differ in their tails. The tails describe how frequently rare events occur. They describe whether rare events really are so rare. This leads to the saying that the devil is in the tails." (Bart Kosko,Noise", 2006)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context" (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau,Data Points: Visualization That Means Something", 2013)

"A very different - and very incorrect - argument is that successes must be balanced by failures (and failures by successes) so that things average out. Every coin flip that lands heads makes tails more likely. Every red at roulette makes black more likely. […] These beliefs are all incorrect. Good luck will certainly not continue indefinitely, but do not assume that good luck makes bad luck more likely, or vice versa." (Gary Smith,Standard Deviations", 2014)

"The more complex the system, the more variable (risky) the outcomes. The profound implications of this essential feature of reality still elude us in all the practical disciplines. Sometimes variance averages out, but more often fat-tail events beget more fat-tail events because of interdependencies. If there are multiple projects running, outlier (fat-tail) events may also be positively correlated - one IT project falling behind will stretch resources and increase the likelihood that others will be compromised." (Paul Gibbons,The Science of Successful Organizational Change", 2015)

"Many statistical procedures perform more effectively on data that are normally distributed, or at least are symmetric and not excessively kurtotic" (fat-tailed), and where the mean and variance are approximately constant. Observed time series frequently require some form of transformation before they exhibit these distributional properties, for in their 'raw' form they are often asymmetric." (Terence C Mills,Applied Time Series Analysis: A practical guide to modeling and forecasting", 2019)

"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high" (for example, income) or low" (for example, legs) values." (David Spiegelhalter,The Art of Statistics: Learning from Data", 2019)

"[…] it is not merely that events in the tails of the distributions matter, happen, play a large role, etc. The point is that these events play the major role and their probabilities are not" (easily) computable, not reliable for any effective use. The implication is that Black Swans do not necessarily come from fat tails; the problem can result from an incomplete assessment of tail events." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"[…] whenever people make decisions after being supplied with the standard deviation number, they act as if it were the expected mean deviation." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"Behavioral finance so far makes conclusions from statics not dynamics, hence misses the picture. It applies trade-offs out of context and develops the consensus that people irrationally overestimate tail risk" (hence need to be 'nudged' into taking more of these exposures). But the catastrophic event is an absorbing barrier. No risky exposure can be analyzed in isolation: risks accumulate. If we ride a motorcycle, smoke, fly our own propeller plane, and join the mafia, these risks add up to a near-certain premature death. Tail risks are not a renewable resource." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"But note that any heavy tailed process, even a power law, can be described in sample" (that is finite number of observations necessarily discretized) by a simple Gaussian process with changing variance, a regime switching process, or a combination of Gaussian plus a series of variable jumps" (though not one where jumps are of equal size […])." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"Once we know something is fat-tailed, we can use heuristics to see how an exposure there reacts to random events: how much is a given unit harmed by them. It is vastly more effective to focus on being insulated from the harm of random events than try to figure them out in the required details" (as we saw the inferential errors under thick tails are huge). So it is more solid, much wiser, more ethical, and more effective to focus on detection heuristics and policies rather than fabricate statistical properties." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"No one sees further into a generalization than his own knowledge of detail extends." (William James)

"Remember that a p-value merely indicates the probability of a particular set of data being generated by the null model–it has little to say about the size of a deviation from that model" (especially in the tails of the distribution, where large changes in effect size cause only small changes in p-values)." (Clay Helberg)

🔭Data Science: Forecasting (Just the Quotes)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff, "How to Lie with Statistics", 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"The moment you forecast you know you’re going to be wrong, you just don’t know when and in which direction." (Edgar R Fiedler, 1977)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details - in the quality of information that forms the basis of probability estimates." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Time-series forecasting is essentially a form of extrapolation in that it involves fitting a model to a set of data and then using that model outside the range of data to which it has been fitted. Extrapolation is rightly regarded with disfavour in other statistical areas, such as regression analysis. However, when forecasting the future of a time series, extrapolation is unavoidable." (Chris Chatfield, "Time-Series Forecasting" 2nd Ed, 2000)

"Models can be viewed and used at three levels. The first is a model that fits the data. A test of goodness-of-fit operates at this level. This level is the least useful but is frequently the one at which statisticians and researchers stop. For example, a test of a linear model is judged good when a quadratic term is not significant. A second level of usefulness is that the model predicts future observations. Such a model has been called a forecast model. This level is often required in screening studies or studies predicting outcomes such as growth rate. A third level is that a model reveals unexpected features of the situation being described, a structural model, [...] However, it does not explain the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential' view." (Ray Kurzweil, "The Singularity is Near", 2005)

"A forecaster should almost never ignore data, especially when she is studying rare events […]. Ignoring data is often a tip-off that the forecaster is overconfident, or is overfitting her model - that she is interested in showing off rather than trying to be accurate." (Nate Silver, "The Signal and the Noise: Why So Many Predictions Fail-but Some Don't", 2012)

"Whether information comes in a quantitative or qualitative flavor is not as important as how you use it. [...] The key to making a good forecast […] is not in limiting yourself to quantitative information. Rather, it’s having a good process for weighing the information appropriately. […] collect as much information as possible, but then be as rigorous and disciplined as possible when analyzing it. [...] Many times, in fact, it is possible to translate qualitative information into quantitative information." (Nate Silver, "The Signal and the Noise: Why So Many Predictions Fail-but Some Don't", 2012)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Using random processes in our models allows economists to capture the variability of time series data, but it also poses challenges to model builders. As model builders, we must understand the uncertainty from two different perspectives. Consider first that of the econometrician, standing outside an economic model, who must assess its congruence with reality, inclusive of its random perturbations. An econometrician’s role is to choose among different parameters that together describe a family of possible models to best mimic measured real world time series and to test the implications of these models. I refer to this as outside uncertainty. Second, agents inside our model, be it consumers, entrepreneurs, or policy makers, must also confront uncertainty as they make decisions. I refer to this as inside uncertainty, as it pertains to the decision-makers within the model. What do these agents know? From what information can they learn? With how much confidence do they forecast the future? The modeler’s choice regarding insiders’ perspectives on an uncertain future can have significant consequences for each model’s equilibrium outcomes." (Lars P Hansen, "Uncertainty Outside and Inside Economic Models", [Nobel lecture] 2013)

"One important thing to bear in mind about the outputs of data science and analytics is that in the vast majority of cases they do not uncover hidden patterns or relationships as if by magic, and in the case of predictive analytics they do not tell us exactly what will happen in the future. Instead, they enable us to forecast what may come. In other words, once we have carried out some modelling there is still a lot of work to do to make sense out of the results obtained, taking into account the constraints and assumptions in the model, as well as considering what an acceptable level of reliability is in each scenario." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"Regression describes the relationship between an exploratory variable (i.e., independent) and a response variable (i.e., dependent). Exploratory variables are also referred to as predictors and can have a frequency of more than 1. Regression is being used within the realm of predictions and forecasting. Regression determines the change in response variable when one exploratory variable is varied while the other independent variables are kept constant. This is done to understand the relationship that each of those exploratory variables exhibits." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"The first myth is that prediction is always based on time-series extrapolation into the future (also known as forecasting). This is not the case: predictive analytics can be applied to generate any type of unknown data, including past and present. In addition, prediction can be applied to non-temporal (time-based) use cases such as disease progression modeling, human relationship modeling, and sentiment analysis for medication adherence, etc. The second myth is that predictive analytics is a guarantor of what will happen in the future. This also is not the case: predictive analytics, due to the nature of the insights they create, are probabilistic and not deterministic. As a result, predictive analytics will not be able to ensure certainty of outcomes." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"We know what forecasting is: you start in the present and try to look into the future and imagine what it will be like. Backcasting is the opposite: you state your desired vision of the future as if it’s already happened, and then work backward to imagine the practices, policies, programs, tools, training, and people who worked in concert in a hypothetical past (which takes place in the future) to get you there." (Eben Hewitt, "Technology Strategy Patterns: Architecture as strategy" 2nd Ed., 2019)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

02 November 2018

🔭Data Science: Nonlinearity (Just the Quotes)

"The term chaos is used in a specific sense where it is an inherently random pattern of behaviour generated by fixed inputs into deterministic (that is fixed) rules (relationships). The rules take the form of non-linear feedback loops. Although the specific path followed by the behaviour so generated is random and hence unpredictable in the long-term, it always has an underlying pattern to it, a 'hidden' pattern, a global pattern or rhythm. That pattern is self-similarity, that is a constant degree of variation, consistent variability, regular irregularity, or more precisely, a constant fractal dimension. Chaos is therefore order (a pattern) within disorder (random behaviour)." (Ralph D Stacey, "The Chaos Frontier: Creative Strategic Control for Business", 1991)

"In nonlinear systems - and the economy is most certainly nonlinear - chaos theory tells you that the slightest uncertainty in your knowledge of the initial conditions will often grow inexorably. After a while, your predictions are nonsense." (M Mitchell Waldrop, "Complexity: The Emerging Science at the Edge of Order and Chaos", 1992)

"In addition to dimensionality requirements, chaos can occur only in nonlinear situations. In multidimensional settings, this means that at least one term in one equation must be nonlinear while also involving several of the variables. With all linear models, solutions can be expressed as combinations of regular and linear periodic processes, but nonlinearities in a model allow for instabilities in such periodic solutions within certain value ranges for some of the parameters." (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"The dimensionality and nonlinearity requirements of chaos do not guarantee its appearance. At best, these conditions allow it to occur, and even then under limited conditions relating to particular parameter values. But this does not imply that chaos is rare in the real world. Indeed, discoveries are being made constantly of either the clearly identifiable or arguably persuasive appearance of chaos. Most of these discoveries are being made with regard to physical systems, but the lack of similar discoveries involving human behavior is almost certainly due to the still developing nature of nonlinear analyses in the social sciences rather than the absence of chaos in the human setting." (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"There is a new science of complexity which says that the link between cause and effect is increasingly difficult to trace; that change (planned or otherwise) unfolds in non-linear ways; that paradoxes and contradictions abound; and that creative solutions arise out of diversity, uncertainty and chaos." (Andy P Hargreaves & Michael Fullan, "What’s Worth Fighting for Out There?", 1998)

"A system may be called complex here if its dimension (order) is too high and its model (if available) is nonlinear, interconnected, and information on the system is uncertain such that classical techniques can not easily handle the problem." (M Jamshidi, "Autonomous Control on Complex Systems: Robotic Applications", Current Advances in Mechanical Design and Production VII, 2000)

"Most physical systems, particularly those complex ones, are extremely difficult to model by an accurate and precise mathematical formula or equation due to the complexity of the system structure, nonlinearity, uncertainty, randomness, etc. Therefore, approximate modeling is often necessary and practical in real-world applications. Intuitively, approximate modeling is always possible. However, the key questions are what kind of approximation is good, where the sense of 'goodness' has to be first defined, of course, and how to formulate such a good approximation in modeling a system such that it is mathematically rigorous and can produce satisfactory results in both theory and applications." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Swarm intelligence can be effective when applied to highly complicated problems with many nonlinear factors, although it is often less effective than the genetic algorithm approach discussed later in this chapter. Swarm intelligence is related to swarm optimization […]. As with swarm intelligence, there is some evidence that at least some of the time swarm optimization can produce solutions that are more robust than genetic algorithms. Robustness here is defined as a solution’s resistance to performance degradation when the underlying variables are changed." (Michael J North & Charles M Macal, "Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation", 2007)

"Thus, nonlinearity can be understood as the effect of a causal loop, where effects or outputs are fed back into the causes or inputs of the process. Complex systems are characterized by networks of such causal loops. In a complex, the interdependencies are such that a component A will affect a component B, but B will in general also affect A, directly or indirectly. A single feedback loop can be positive or negative. A positive feedback will amplify any variation in A, making it grow exponentially. The result is that the tiniest, microscopic difference between initial states can grow into macroscopically observable distinctions." (Carlos Gershenson, "Design and Control of Self-organizing Systems", 2007)

"All forms of complex causation, and especially nonlinear transformations, admittedly stack the deck against prediction. Linear describes an outcome produced by one or more variables where the effect is additive. Any other interaction is nonlinear. This would include outcomes that involve step functions or phase transitions. The hard sciences routinely describe nonlinear phenomena. Making predictions about them becomes increasingly problematic when multiple variables are involved that have complex interactions. Some simple nonlinear systems can quickly become unpredictable when small variations in their inputs are introduced." (Richard N Lebow, "Forbidden Fruit: Counterfactuals and International Relations", 2010)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Complexity is a relative term. It depends on the number and the nature of interactions among the variables involved. Open loop systems with linear, independent variables are considered simpler than interdependent variables forming nonlinear closed loops with a delayed response." (Jamshid Gharajedaghi, "Systems Thinking: Managing Chaos and Complexity A Platform for Designing Business Architecture" 3rd Ed., 2011)

"We have minds that are equipped for certainty, linearity and short-term decisions, that must instead make long-term decisions in a non-linear, probabilistic world." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"Random forests are essentially an ensemble of trees. They use many short trees, fitted to multiple samples of the data, and the predictions are averaged for each observation. This helps to get around a problem that trees, and many other machine learning techniques, are not guaranteed to find optimal models, in the way that linear regression is. They do a very challenging job of fitting non-linear predictions over many variables, even sometimes when there are more variables than there are observations. To do that, they have to employ 'greedy algorithms', which find a reasonably good model but not necessarily the very best model possible." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Exponentially growing systems are prevalent in nature, spanning all scales from biochemical reaction networks in single cells to food webs of ecosystems. How exponential growth emerges in nonlinear systems is mathematically unclear. […] The emergence of exponential growth from a multivariable nonlinear network is not mathematically intuitive. This indicates that the network structure and the flux functions of the modeled system must be subjected to constraints to result in long-term exponential dynamics." (Wei-Hsiang Lin et al, "Origin of exponential growth in nonlinear reaction networks", PNAS 117 (45), 2020)

"Non-linear associations are also quantifiable. Even linear regression can be used to model some non-linear relationships. This is possible because linear regression has to be linear in parameters, not necessarily in the data. More complex relationships can be quantified using entropy-based metrics such as mutual information. Linear models can also handle interaction terms. We talk about interaction when the model’s output depends on a multiplicative relationship between two or more variables." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

🔭Data Science: Linearity (Just the Quotes)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"An oft-repeated rule of thumb in any sort of statistical model fitting is 'you can't fit a model with more parameters than data points'. This idea appears to be as wide-spread as it is incorrect. On the contrary, if you construct your models carefully, you can fit models with more parameters than datapoints [...]. A model with more parameters than datapoints is known as an under-determined system, and it's a common misperception that such a model cannot be solved in any circumstance. [...] this misconception, which I like to call the 'model complexity myth' [...] is not true in general, it is true in the specific case of simple linear models, which perhaps explains why the myth is so pervasive." (Jake Vanderplas, "The Model Complexity Myth", 2015) [source]

See also the quotes on linearity in Graphical Representation

🔭Data Science: Data Analysts (Just the Quotes)

"The physical sciences are used to ‘praying over’ their data, examining the same data from a variety of points of view. This process has been very rewarding, and has led to many extremely valuable insights. Without this sort of flexibility, progress in physical science would have been much slower. Flexibility in analysis is often to be had honestly at the price of a willingness not to demand that what has already been observed shall establish, or prove, what analysis suggests. In physical science generally, the results of praying over the data are thought of as something to be put to further test in another experiment, as indications rather than conclusions." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"[…] it is not enough to say: 'There's error in the data and therefore the study must be terribly dubious'. A good critic and data analyst must do more: he or she must also show how the error in the measurement or the analysis affects the inferences made on the basis of that data and analysis." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Detailed study of the quality of data sources is an essential part of applied work. [...] Data analysts need to understand more about the measurement processes through which their data come. To know the name by which a column of figures is headed is far from being enough." (John W Tukey, "An Overview of Techniques of Data Analysis, Emphasizing Its Exploratory Aspects", 1982)

"Like a detective, a data analyst will experience many dead ends, retrace his steps, and explore many alternatives before settling on a single description of the evidence in front of him." (David Lubinsky & Daryl Pregibon , "Data analysis as search", Journal of Econometrics Vol. 38 (1–2), 1988)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. Any data analyst needs to know how to organize and use these four questions in order to obtain meaningful and correct results. [...]
THE DESCRIPTION QUESTION: Given a collection of numbers, are there arithmetic values that will summarize the information contained in those numbers in some meaningful way?
THE PROBABILITY QUESTION: Given a known universe, what can we say about samples drawn from this universe? [...]
THE INFERENCE QUESTION: Given an unknown universe, and given a sample that is known to have been drawn from that unknown universe, and given that we know everything about the sample, what can we say about the unknown universe? [...]
THE HOMOGENEITY QUESTION: Given a collection of observations, is it reasonable to assume that they came from one universe, or do they show evidence of having come from multiple universes?" (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"[…] the data itself can lead to new questions too. In exploratory data analysis (EDA), for example, the data analyst discovers new questions based on the data. The process of looking at the data to address some of these questions generates incidental visualizations - odd patterns, outliers, or surprising correlations that are worth looking into further." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Plotting numbers on a chart does not make you a data analyst. Knowing and understanding your data before you communicate it to your audience does." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Also, remember that data literacy is not just a set of technical skills. There is an equal need and weight for soft skills and business skills. This can be misleading for some technical resources within an organization, as those technical resources may believe they are data literate by default as they are data architects or data analysts. They have the existing technical skills, but maybe they do not have any deep proficiencies in other skills such as communicating with data, challenging assumptions, and mitigating bias, or perhaps they do not have an open mindset to be open to different perspectives." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"The lack of focus and commitment to color is a perplexing thing. When used correctly, color has no equal as a visualization tool - in advertising, in branding, in getting the message across to any audience you seek. Data analysts can make numbers dance and sing on command, but they sometimes struggle to create visually stimulating environments that convince the intended audience to tap their feet in time." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

🔭Data Science: Skewness (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithm is an extremely powerful and useful tool for graphical data presentation. One reason is that logarithms turn ratios into differences, and for many sets of data, it is natural to think in terms of ratios. […] Another reason for the power of logarithms is resolution. Data that are amounts or counts are often very skewed to the right; on graphs of such data, there are a few large values that take up most of the scale and the majority of the points are squashed into a small region of the scale with no resolution." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"It is common for positive data to be skewed to the right: some values bunch together at the low end of the scale and others trail off to the high end with increasing gaps between the values as they get higher. Such data can cause severe resolution problems on graphs, and the common remedy is to take logarithms. Indeed, it is the frequent success of this remedy that partly accounts for the large use of logarithms in graphical data display." (William S Cleveland, "The Elements of Graphing Data", 1985)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"The standard deviation (often SD) is a measure of variability. When we calculate the standard deviation of a sample, we are using it as an estimate of the variability of the population from which the sample was drawn. For data with a normal distribution, about 95% of individu als will have values within 2 standard deviations of the mean, the other 5% being equally scattered above and below these limits. Contrary to popular misconception, the standard deviation is a valid measure of variability regardless of the distribution. About 95% of observa tions of any distribution usually fall within the 2 standard deviation limits, though those outside may all be at one end. We may choose a different summary statistic, how ever, when data have a skewed distribution." (Douglas G Altman & J Martin Bland, "Statistics Notes: Standard Deviations And Standard Errors", British Medical Journal Vol. 331 (7521) 2005)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Distributional shape is an important attribute of data, regardless of whether scores are analyzed descriptively or inferentially. Because the degree of skewness can be summarized by means of a single number, and because computers have no difficulty providing such measures (or estimates) of skewness, those who prepare research reports should include a numerical index of skewness every time they provide measures of central tendency and variability." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[The normality] assumption is the least important one for the reliability of the statistical procedures under discussion. Violations of the normality assumption can be divided into two general forms: Distributions that have heavier tails than the normal and distributions that are skewed rather than symmetric. If data is skewed, the formulas we are discussing are still valid as long as the sample size is sufficiently large. Although the guidance about 'how skewed' and 'how large a sample' can be quite vague, since the greater the skew, the larger the required sample size. For the data commonly used in time series and for the sample sizes (which are generally quite large) used, skew is not a problem. On the other hand, heavy tails can be very problematic." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"In statistical theory, location and variability are referred to as the first and second moments of a distribution. The third and fourth moments are called skewness and kurtosis. Skewness refers to whether the data is skewed to larger or smaller values and kurtosis indicates the propensity of the data to have extreme values. Generally, metrics are not used to measure skewness and kurtosis; instead, these are discovered through visual displays [...]" (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"New information is constantly flowing in, and your brain is constantly integrating it into this statistical distribution that creates your next perception (so in this sense 'reality' is just the product of your brain’s ever-evolving database of consequence). As such, your perception is subject to a statistical phenomenon known in probability theory as kurtosis. Kurtosis in essence means that things tend to become increasingly steep in their distribution [...] that is, skewed in one direction. This applies to ways of seeing everything from current events to ourselves as we lean 'skewedly' toward one interpretation, positive or negative. Things that are highly kurtotic, or skewed, are hard to shift away from. This is another way of saying that seeing differently isn’t just conceptually difficult - it’s statistically difficult." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high (for example, income) or low (for example, legs) values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"With skewed data, quantiles will reflect the skew, while adding standard deviations assumes symmetry in the distribution and can be misleading." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Adjusting scale is an important practice in data visualization. While the log transform is versatile, it doesn’t handle all situations where skew or curvature occurs. For example, at times the values are all roughly the same order of magnitude and the log transformation has little impact. Another transformation to consider is the square root transformation, which is often useful for count data." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

Data Science: Torturing the Data in Statistics

Statistics, through its methods, techniques and models rooted in mathematical reasoning, allows exploring, analyzing and summarizing a given set of data, being used to support decision-making, experiments, theories and ultimately to gain and communicate insights. When used adequately, statistics can prove to be a useful toolset, however as soon its use deviates from the mathematical rigor and principles on which it was built, it can be easily misused. Moreover, the results obtained with the help of statistics, can be easily denatured in communication, even when the statistical results are valid.

The easiness with which statistics can be misused is probably best reflected in sayings like 'if you torture the data long enough it will confess'. The formulation is attributed by several sources to the economist Ronald H Coase, however according to Coase the reference made by him in the 1960’s was slightly different: 'if you torture the data enough, nature will always confess' (see [1]). The latter formulation is not necessarily negative if one considers the persistence needed by researchers in revealing nature’s secrets. In exchange, the former formulation seems to stress only the negative aspect.

The word 'torture' seems to be used instead of 'abuse', though metaphorically it has more weight, it draws the attention and sticks with the reader or audience. As the Quotes Investigator remarks [1], ‘torturing the data’ was employed as metaphor much earlier. For example, a 1933 article contains the following passage:

"The evidence submitted by the committee from its own questionnaire warrants no such conclusion. To torture the data given in Table I into evidence supporting a twelve-hour minimum of professional training is indeed a statistical feat, but one which the committee accomplishes to its own satisfaction." ("The Elementary School Journal" Vol. 33 (7), 1933)

More than a decade earlier, in a similar context with Coase's quote, John Dewey remarked:

"Active experimentation must force the apparent facts of nature into forms different to those in which they familiarly present themselves; and thus make them tell the truth about themselves, as torture may compel an unwilling witness to reveal what he has been concealing." (John Dewey, "Reconstruction in Philosophy", 1920)

Torture was used metaphorically from 1600s, if we consider the following quote from Sir Francis Bacon’s 'Advancement of Learning':

"Another diversity of Methods is according to the subject or matter which is handled; for there is a great difference in delivery of the Mathematics, which are the most abstracted of knowledges, and Policy, which is the most immersed […], yet we see how that opinion, besides the weakness of it, hath been of ill desert towards learning, as that which taketh the way to reduce learning to certain empty and barren generalities; being but the very husks and shells of sciences, all the kernel being forced out and expulsed with the torture and press of the method." (Sir Francis Bacon, Advancement of Learning, 1605)

However a similar metaphor with closer meaning can be found almost two centuries later:

"One very reprehensible mode of theory-making consists, after honest deductions from a few facts have been made, in torturing other facts to suit the end proposed, in omitting some, and in making use of any authority that may lend assistance to the object desired; while all those which militate against it are carefully put on one side or doubted." (Henry De la Beche, "Sections and Views, Illustrative of Geological Phaenomena", 1830)

Probably, also the following quote from Goethe deservers some attention:

"Someday someone will write a pathology of experimental physics and bring to light all those swindles which subvert our reason, beguile our judgement and, what is worse, stand in the way of any practical progress. The phenomena must be freed once and for all from their grim torture chamber of empiricism, mechanism, and dogmatism; they must be brought before the jury of man's common sense." (Johann Wolfgang von Goethe)

Alternatives to Coase’s formulation were used in several later sources, replacing 'data' with 'statistics' or 'numbers':

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M Stigler, "Neutral Models in Biology", 1987)

"Torture numbers, and they will confess to anything." (Gregg Easterbrook, New Republic, 1989)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"This is true only if you torture the statistics until they produce the confession you want." (Larry Schweikart, "Myths of the 1980s Distort Debate over Tax Cuts", 2001) [source]

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

There is also a psychological component attached to data or facts' torturing to fit the reality, tendency derived from the way the human mind works, the limits and fallacies associated with mind's workings.

"What are the models? Well, the first rule is that you’ve got to have multiple models - because if you just have one or two that you’re using, the nature of human psychology is such that you’ll torture reality so that it fits your models, or at least you’ll think it does." (Charles Munger, 1994)

Independently of the formulation and context used, the fact remains: statistics (aka data, numbers) can be easily abused, and the reader/audience should be aware of it!

Previously published on quotablemath.blogspot.com.

🔭Data Science: Intelligence (Just the Quotes)

"To be able to discern that what is true is true, and that what is false is false, - this is the mark and character of intelligence." (Ralph W Emerson, "Essays", 1841)

"We study the complex in the simple; and only from the intuition of the lower can we safely proceed to the intellection of the higher degrees. The only danger lies in the leaping from low to high, with the neglect of the intervening gradations." (Samuel T Coleridge, "Physiology of Life", 1848)

"The accidental causes of science are only 'accidents' relatively to the intelligence of a man." (Chauncey Wright, "The Genesis of Species", North American Review, 1871)

"Does the harmony the human intelligence thinks it discovers in nature exist outside of this intelligence? No, beyond doubt, a reality completely independent of the mind which conceives it, sees or feels it, is an impossibility." (Henri Poincaré, "The Value of Science", 1905)

"No one can predict how far we shall be enabled by means of our limited intelligence to penetrate into the mysteries of a universe immeasurably vast and wonderful; nevertheless, each step in advance is certain to bring new blessings to humanity and new inspiration to greater endeavor." (Theodore W Richards, "The Fundamental Properties of the Elements", [Faraday lecture] 1911)

"It may be impossible for human intelligence to comprehend absolute truth, but it is possible to observe Nature with an unbiased mind and to bear truthful testimony of things seen." (Sir Richard A Gregory, "Discovery, Or, The Spirit and Service of Science", 1916)

"In other words then, if a machine is expected to be infallible, it cannot also be intelligent. There are several theorems which say almost exactly that. But these theorems say nothing about how much intelligence may be displayed if a machine makes no pretense at infallibility." (Alan M Turing, 1946)

"A computer would deserve to be called intelligent if it could deceive a human into believing that it was human." (Alan Turing, "Computing Machinery and Intelligence" , Mind Vol. 59, 1950)

"All intelligent endeavor stands with one foot on observation and the other on contemplation." (Gerald Holton & Duane H D Roller, "Foundations of Modern Physical Science", 1950)

"What in fact is the schema of the object? In one essential respect it is a schema belonging to intelligence. To have the concept of an object is to attribute the perceived figure to a substantial basis, so that the figure and the substance that it thus indicates continue to exist outside the perceptual field. The permanence of the object seen from this viewpoint is not only a product of intelligence, but constitutes the very first of those fundamental ideas of conservation which we shall see developing within the thought process." (Jean Piaget, "The Psychology of Intelligence", 1950)

"[…] observation is not enough, and it seems to me that in science, as in the arts, there is very little worth having that does not require the exercise of intuition as well as of intelligence, the use of imagination as well as of information." (Kathleen Lonsdale, "Facts About Crystals", American Scientist Vol. 39 (4), 1951)

"Concepts are for me specific mental abilities exercised in acts of judgment, and expressed in the intelligent use of words (though not exclusively in such use)." (Peter T Geach, "Mental Acts: Their Content and their Objects", 1954)

"The following are some aspects of the artificial intelligence problem: […] If a machine can do a job, then an automatic calculator can be programmed to simulate the machine. […] It may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture. From this point of view, forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others. This idea has never been very precisely formulated nor have examples been worked out. […] How can a set of (hypothetical) neurons be arranged so as to form concepts. […] to get a measure of the efficiency of a calculation it is necessary to have on hand a method of measuring the complexity of calculating devices which in turn can be done. […] Probably a truly intelligent machine will carry out activities which may best be described as self-improvement. […] A number of types of 'abstraction' can be distinctly defined and several others less distinctly. […] the difference between creative thinking and unimaginative competent thinking lies in the injection of a some randomness. The randomness must be guided by intuition to be efficient." (John McCarthy et al, "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence", 1955)

"Solving problems is the specific achievement of intelligence." (George Polya, 1957)

"Computers do not decrease the need for mathematical analysis, but rather greatly increase this need. They actually extend the use of analysis into the fields of computers and computation, the former area being almost unknown until recently, the latter never having been as intensively investigated as its importance warrants. Finally, it is up to the user of computational equipment to define his needs in terms of his problems, In any case, computers can never eliminate the need for problem-solving through human ingenuity and intelligence." (Richard E Bellman & Paul Brock, "On the Concepts of a Problem and Problem-Solving", American Mathematical Monthly 67, 1960)

"Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion:, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make." (Irving J Good, "Speculations Concerning the First Ultraintelligent Machine", Advances in Computers Vol. 6, 1965)

"When intelligent machines are constructed, we should not be surprised to find them as confused and as stubborn as men in their convictions about mind-matter, consciousness, free will, and the like." (Marvin Minsky, "Matter, Mind, and Models", Proceedings of the International Federation of Information Processing Congress Vol. 1 (49), 1965)

"Artificial intelligence is the science of making machines do things that would require intelligence if done by men." (Marvin Minsky, 1968)

"Intelligence has two parts, which we shall call the epistemological and the heuristic. The epistemological part is the representation of the world in such a form that the solution of problems follows from the facts expressed in the representation. The heuristic part is the mechanism that on the basis of the information solves the problem and decides what to do." (John McCarthy & Patrick J Hayes, "Some Philosophical Problems from the Standpoint of Artificial Intelligence", Machine Intelligence 4, 1969)

"Questions are the engines of intellect, the cerebral machines which convert energy to motion, and curiosity to controlled inquiry." (David H Fischer, "Historians’ Fallacies", 1970)

"Man is not a machine, [...] although man most certainly processes information, he does not necessarily process it in the way computers do. Computers and men are not species of the same genus. [...] No other organism, and certainly no computer, can be made to confront genuine human problems in human terms. [...] However much intelligence computers may attain, now or in the future, theirs must always be an intelligence alien to genuine human problems and concerns." (Joesph Weizenbaum, Computer Power and Human Reason: From Judgment to Calculation, 1976)

"Play is the only way the highest intelligence of humankind can unfold." (Joseph C Pearce, "Magical Child: Rediscovering Nature's Plan for Our Children", 1977)

"Because of mathematical indeterminancy and the uncertainty principle, it may be a law of nature that no nervous system is capable of acquiring enough knowledge to significantly predict the future of any other intelligent system in detail. Nor can intelligent minds gain enough self-knowledge to know their own future, capture fate, and in this sense eliminate free will." (Edward O Wilson, "On Human Nature", 1978)

"Collective intelligence emerges when a group of people work together effectively. Collective intelligence can be additive (each adds his or her part which together form the whole) or it can be synergetic, where the whole is greater than the sum of its parts." (Trudy and Peter Johnson-Lenz, "Groupware: Orchestrating the Emergence of Collective Intelligence", cca. 1980)

"Knowing a great deal is not the same as being smart; intelligence is not information alone but also judgement, the manner in which information is coordinated and used." (Carl Sagan, "Cosmos", 1980)

"The basic idea of cognitive science is that intelligent beings are semantic engines - in other words, automatic formal systems with interpretations under which they consistently make sense. We can now see why this includes psychology and artificial intelligence on a more or less equal footing: people and intelligent computers (if and when there are any) turn out to be merely different manifestations of the same underlying phenomenon. Moreover, with universal hardware, any semantic engine can in principle be formally imitated by a computer if only the right program can be found." (John Haugeland, "Semantic Engines: An introduction to mind design", 1981)

"There is a tendency to mistake data for wisdom, just as there has always been a tendency to confuse logic with values, intelligence with insight. Unobstructed access to facts can produce unlimited good only if it is matched by the desire and ability to find out what they mean and where they lead." (Norman Cousins, "Human Options : An Autobiographical Notebook", 1981)

"Cybernetic information theory suggests the possibility of assuming that intelligence is a feature of any feedback system that manifests a capacity for learning." (Paul Hawken et al, "Seven Tomorrows", 1982)

"We lose all intelligence by averaging." (John Naisbitt, "Megatrends: Ten New Directions Transforming Our Lives", 1982)

"Artificial intelligence is based on the assumption that the mind can be described as some kind of formal system manipulating symbols that stand for things in the world. Thus it doesn't matter what the brain is made of, or what it uses for tokens in the great game of thinking. Using an equivalent set of tokens and rules, we can do thinking with a digital computer, just as we can play chess using cups, salt and pepper shakers, knives, forks, and spoons. Using the right software, one system (the mind) can be mapped onto the other (the computer)." (George Johnson, Machinery of the Mind: Inside the New Science of Artificial Intelligence, 1986)

"Cybernetics is simultaneously the most important science of the age and the least recognized and understood. It is neither robotics nor freezing dead people. It is not limited to computer applications and it has as much to say about human interactions as it does about machine intelligence. Today’s cybernetics is at the root of major revolutions in biology, artificial intelligence, neural modeling, psychology, education, and mathematics. At last there is a unifying framework that suspends long-held differences between science and art, and between external reality and internal belief." (Paul Pangaro, "New Order From Old: The Rise of Second-Order Cybernetics and Its Implications for Machine Intelligence", 1988)

"A popular myth says that the invention of the computer diminishes our sense of ourselves, because it shows that rational thought is not special to human beings, but can be carried on by a mere machine. It is a short stop from there to the conclusion that intelligence is mechanical, which many people find to be an affront to all that is most precious and singular about their humanness." (Jeremy Campbell, "The improbable machine", 1989)

"Fuzziness, then, is a concomitant of complexity. This implies that as the complexity of a task, or of a system for performing that task, exceeds a certain threshold, the system must necessarily become fuzzy in nature. Thus, with the rapid increase in the complexity of the information processing tasks which the computers are called upon to perform, we are reaching a point where computers will have to be designed for processing of information in fuzzy form. In fact, it is the capability to manipulate fuzzy concepts that distinguishes human intelligence from the machine intelligence of current generation computers. Without such capability we cannot build machines that can summarize written text, translate well from one natural language to another, or perform many other tasks that humans can do with ease because of their ability to manipulate fuzzy concepts." (Lotfi A Zadeh, "The Birth and Evolution of Fuzzy Logic", 1989)

"Modeling underlies our ability to think and imagine, to use signs and language, to communicate, to generalize from experience, to deal with the unexpected, and to make sense out of the raw bombardment of our sensations. It allows us to see patterns, to appreciate, predict, and manipulate processes and things, and to express meaning and purpose. In short, it is one of the most essential activities of the human mind. It is the foundation of what we call intelligent behavior and is a large part of what makes us human. We are, in a word, modelers: creatures that build and use models routinely, habitually – sometimes even compulsively – to face, understand, and interact with reality." (Jeff Rothenberg, "The Nature of Modeling. In: Artificial Intelligence, Simulation, and Modeling", 1989)

"We haven't worked on ways to develop a higher social intelligence […] We need this higher intelligence to operate socially or we're not going to survive. […] If we don't manage things socially, individual high intelligence is not going to make much difference. [...] Ordinary thought in society is incoherent - it is going in all sorts of directions, with thoughts conflicting and canceling each other out. But if people were to think together in a coherent way, it would have tremendous power." (David Bohm, "New Age Journal", 1989)

"[Language comprehension] involves many components of intelligence: recognition of words, decoding them into meanings, segmenting word sequences into grammatical constituents, combining meanings into statements, inferring connections among statements, holding in short-term memory earlier concepts while processing later discourse, inferring the writer’s or speaker’s intentions, schematization of the gist of a passage, and memory retrieval in answering questions about the passage. [… The reader] constructs a mental representation of the situation and actions being described. […] Readers tend to remember the mental model they constructed from a text, rather than the text itself." (Gordon H Bower & Daniel G Morrow, 1990)

"The insight at the root of artificial intelligence was that these 'bits' (manipulated by computers) could just as well stand as symbols for concepts that the machine would combine by the strict rules of logic or the looser associations of psychology." (Daniel Crevier, "AI: The tumultuous history of the search for artificial intelligence", 1993)

"The leading edge of growth of intelligence is at the cultural and societal level. It is like a mind that is struggling to wake up. This is necessary because the most difficult problems we face are now collective ones. They are caused by complex global interactions and are beyond the scope of individuals to understand and solve. Individual mind, with its isolated viewpoints and narrow interests, is no longer enough." (Jeff Wright, "Basic Beliefs", [email] 1995)

"Adaptation is the process of changing a system during its operation in a dynamically changing environment. Learning and interaction are elements of this process. Without adaptation there is no intelligence." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Artificial intelligence comprises methods, tools, and systems for solving problems that normally require the intelligence of humans. The term intelligence is always defined as the ability to learn effectively, to react adaptively, to make proper decisions, to communicate in language or images in a sophisticated way, and to understand." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Learning is the process of obtaining new knowledge. It results in a better reaction to the same inputs at the next session of operation. It means improvement. It is a step toward adaptation. Learning is a major characteristic of intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Intelligence is: (a) the most complex phenomenon in the Universe; or (b) a profoundly simple process. The answer, of course, is (c) both of the above. It's another one of those great dualities that make life interesting." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"It [collective intelligence] is a form of universally distributed intelligence, constantly enhanced, coordinated in real time, and resulting in the effective mobilization of skills. I'll add the following indispensable characteristic to this definition: The basis and goal of collective intelligence is mutual recognition and enrichment of individuals rather than the cult of fetishized or hypostatized communities." (Pierre Levy, "Collective Intelligence", 1999)

"It is, however, fair to say that very few applications of swarm intelligence have been developed. One of the main reasons for this relative lack of success resides in the fact that swarm-intelligent systems are hard to 'program', because the paths to problem solving are not predefined but emergent in these systems and result from interactions among individuals and between individuals and their environment as much as from the behaviors of the individuals themselves. Therefore, using a swarm-intelligent system to solve a problem requires a thorough knowledge not only of what individual behaviors must be implemented but also of what interactions are needed to produce such or such global behavior." (Eric Bonabeau et al, "Swarm Intelligence: From Natural to Artificial Systems", 1999)

"Once a computer achieves human intelligence it will necessarily roar past it." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"[…] when software systems become so intractable that they can no longer be controlled, swarm intelligence offers an alternative way of designing an ‘intelligent’ systems, in which autonomy, emergence, and distributed functioning replace control, preprogramming, and centralization." (Eric Bonabeau et al, "Swarm Intelligence: From Natural to Artificial Systems", 1999)

"With the growing interest in complex adaptive systems, artificial life, swarms and simulated societies, the concept of 'collective intelligence' is coming more and more to the fore. The basic idea is that a group of individuals (e. g. people, insects, robots, or software agents) can be smart in a way that none of its members is. Complex, apparently intelligent behavior may emerge from the synergy created by simple interactions between individuals that follow simple rules." (Francis Heylighen, "Collective Intelligence and its Implementation on the Web", 1999)

"Ecological rationality uses reason – rational reconstruction – to examine the behavior of individuals based on their experience and folk knowledge, who are ‘naïve’ in their ability to apply constructivist tools to the decisions they make; to understand the emergent order in human cultures; to discover the possible intelligence embodied in the rules, norms and institutions of our cultural and biological heritage that are created from human interactions but not by deliberate human design. People follow rules without being able to articulate them, but they can be discovered." (Vernon L Smith, "Constructivist and ecological rationality in economics", 2002)

"But intelligence is not just a matter of acting or behaving intelligently. Behavior is a manifestation of intelligence, but not the central characteristic or primary definition of being intelligent. A moment's reflection proves this: You can be intelligent just lying in the dark, thinking and understanding. Ignoring what goes on in your head and focusing instead on behavior has been a large impediment to understanding intelligence and building intelligent machines." (Jeff Hawkins, "On Intelligence", 2004)

"Evolution moves towards greater complexity, greater elegance, greater knowledge, greater intelligence, greater beauty, greater creativity, and greater levels of subtle attributes such as love. […] Of course, even the accelerating growth of evolution never achieves an infinite level, but as it explodes exponentially it certainly moves rapidly in that direction." (Ray Kurzweil, "The Singularity is Near", 2005)

"Swarm Intelligence can be defined more precisely as: Any attempt to design algorithms or distributed problem-solving methods inspired by the collective behavior of the social insect colonies or other animal societies. The main properties of such systems are flexibility, robustness, decentralization and self-organization." ("Swarm Intelligence in Data Mining", Ed. Ajith Abraham et al, 2006))

"Swarm intelligence is sometimes also referred to as mob intelligence. Swarm intelligence uses large groups of agents to solve complicated problems. Swarm intelligence uses a combination of accumulation, teamwork, and voting to produce solutions. Accumulation occurs when agents contribute parts of a solution to a group. Teamwork occurs when different agents or subgroups of agents accidentally or purposefully work on different parts of a large problem. Voting occurs when agents propose solutions or components of solutions and the other agents vote explicitly by rating the proposal’s quality or vote implicitly by choosing whether to follow the proposal." (Michael J North & Charles M Macal, "Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation", 2007)

"The brain and its cognitive mental processes are the biological foundation for creating metaphors about the world and oneself. Artificial intelligence, human beings’ attempt to transcend their biology, tries to enter into these scenarios to learn how they function. But there is another metaphor of the world that has its own particular landscapes, inhabitants, and laws. The brain provides the organic structure that is necessary for generating the mind, which in turn is considered a process that results from brain activity." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Cultures are never merely intellectual constructs. They take form through the collective intelligence and memory, through a commonly held psychology and emotions, through spiritual and artistic communion." (Tariq Ramadan, "Islam and the Arab Awakening", 2012)

"An intuition is neither caprice nor a sixth sense but a form of unconscious intelligence." (Gerd Gigerenzer, "Risk Savvy", 2015)

"Artificial intelligence is the elucidation of the human learning process, the quantification of the human thinking process, the explication of human behavior, and the understanding of what makes intelligence possible." (Kai-Fu Lee, "AI Superpowers: China, Silicon Valley, and the New World Order", 2018)

"Deep learning has instead given us machines with truly impressive abilities but no intelligence. The difference is profound and lies in the absence of a model of reality." (Judea Pearl, "The Book of Why: The New Science of Cause and Effect", 2018)

"AI won‘t be fool proof in the future since it will only as good as the data and information that we give it to learn. It could be the case that simple elementary tricks could fool the AI algorithm and it may serve a complete waste of output as a result." (Zoltan Andrejkovics, "Together: AI and Human. On the Same Side", 2019)

"People who assume that extensions of modern machine learning methods like deep learning will somehow 'train up', or learn to be intelligent like humans, do not understand the fundamental limitations that are already known. Admitting the necessity of supplying a bias to learning systems is tantamount to Turing’s observing that insights about mathematics must be supplied by human minds from outside formal methods, since machine learning bias is determined, prior to learning, by human designers." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

More quotes on "Intelligence" at the-web-of-knowledge.blogspot.com.

01 November 2018

♟️Strategic Management: Game Theory (Just the Quotes)

"While these games are not typical for major economic processes, they contain some universally important traits of all games and the results derived from them are the basis of the general theory of games." (John von Neumann & Oskar Morgenstern, "Theory of Games and Economic Behavior", 1944)

"At present game theory has, in my opinion, two important uses, neither of them related to games nor to conflict directly. First, game theory stimulates us to think about conflict in a novel way. Second, game theory leads to some genuine impasses, that is, to situations where its axiomatic base is shown to be insufficient for dealing even theoretically with certain types of conflict situations... Thus, the impact is made on our thinking process themselves, rather than on the actual content of our knowledge." (Anatol Rapoport, Fights, games, and debates", 1960)

"Although the drama of games of strategy is strongly linked with the psychological aspects of the conflict, game theory is not concerned with these aspects. Game theory, so to speak, plays the board. It is concerned only with the logical aspects of strategy." (Anatol Rapoport, "The Use and Misuse of Game Theory", 1962)

"Game theory applies to a very different type of conflict, now technically called a game. The well-known games such as poker, chess, ticktacktoe and so forth are games in the strict technical Bark and counterbark sense. But what makes parlor games is not their entertainment value or detachment from real life." (Anatol Rapoport, "The Use and Misuse of Game Theory", Scientific American 207, 1962)

"Whether game theory leads to clear-cut solutions, to vague solutions, or to impasses, it does achieve one thing. In bringing techniques of logical and mathematical analysis gives men an opportunity to bring conflicts up from the level of fights, where the intellect is beclouded by passions, to the level of games, where the intellect has a chance to operate." (Anatol Rapoport, "The Use and Misuse of Game Theory", Scientific American 207, 1962)

"Now we are looking for another basic outlook on the world - the world as organization. Such a conception - if it can be substantiated - would indeed change the basic categories upon which scientific thought rests, and profoundly influence practical attitudes. This trend is marked by the emergence of a bundle of new disciplines such as cybernetics, information theory, general system theory, theories of games, of decisions, of queuing and others; in practical applications, systems analysis, systems engineering, operations research, etc. They are different in basic assumptions, mathematical techniques and aims, and they are often unsatisfactory and sometimes contradictory. They agree, however, in being concerned, in one way or another, with ‘systems’, ‘wholes’ or ‘organizations’; and in their totality, they herald a new approach." (Ludwig von Bertalanffy, "General System Theory", 1968)

"A proven theorem of game theory states that every game with complete information possesses a saddle point and therefore a solution." (Richard A Epstein, "The Theory of Gambling and Statistical Logic" [Revised Edition], 1977)

"Game theory is a collection of mathematical models designed to study situations involving conflict and/or cooperation. It allows for a multiplicity of decision makers who may have different preferences and objectives. Such models involve a variety of different solution concepts concerned with strategic optimization, stability, bargaining, compromise, equity and coalition formation." (Notices of the American Mathematical Society Vol. 26 (1), 1979)

"Game theory is a theory of strategic interaction. That is to say, it is a theory of rational behavior in social situations in which each player has to choose his moves on the basis of what he thinks the other players' countermoves are likely to be." (John Harsanyi, "Games with Incomplete Information", 1997)

"An equilibrium is not always an optimum; it might not even be good. This may be the most important discovery of game theory." (Ivar Ekeland, "Le meilleur des mondes possibles" ["The Best of All Possible Worlds"], 2000)

Good decisions require that each decision-maker anticipate the decisions of the others. Game theory offers a systematic way of analysing strategic decision-making in interactive situations. [...] Game theory is not about 'playing' as usually understood. It is about conflict among rational but distrusting beings." (Geraldine Ryan & Seamus Coffey, "Games of Strategy", 2008)

"Game theory proposes a method called minimization-maximization (minimax) that determines the best possibility that is available to a player by following a decision tree that minimizes the opponent’s gain and maximizes the player’s own. This important algorithm is the basis for generating algorithms for chess programs." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Game theory postulates rational behavior for each participant. Each player is conscious of the rules and behaves in accordance with them, each player has sufficient knowledge of the situation in which he or she is involved to be able to evaluate what the best option is when it comes to taking action (a move), and each player takes into account the decisions that might be made by other participants and their repercussions with respect to his or her own decision. Game theory about zero-sum games with two participants is relevant for chess. In this type of situation, each action that is favorable to one participant" (player) is proportionally unfavorable for the opponent. Thus, the gain of one represents the loss of the other." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Game theory covers an incredibly broad spectrum of scenarios of cooperation and competition, but the field began with those resembling heads-up poker: two-person contests where one player’s gain is another player’s loss. Mathematicians analyzing these games seek to identify a so-called equilibrium: that is, a set of strategies that both players can follow such that neither player would want to change their own play, given the play of their opponent. It’s called an equilibrium because it’s stable - no amount of further reflection by either player will bring them to different choices. I’m content with my strategy, given yours, and you’re content with your strategy, given mine." (Brian Christian & Thomas L Griffiths, "Algorithms to Live By: The Computer Science of Human Decisions", 2016)

🔭Data Science: Black Boxes (Just the Quotes)

"The terms 'black box' and 'white box' are convenient and figurative expressions of not very well determined usage. I shall understand by a black box a piece of apparatus, such as four-terminal networks with two input and two output terminals, which performs a definite operation on the present and past of the input potential, but for which we do not necessarily have any information of the structure by which this operation is performed. On the other hand, a white box will be similar network in which we have built in the relation between input and output potentials in accordance with a definite structural plan for securing a previously determined input-output relation." (Norbert Wiener, "Cybernetics: Or Control and Communication in the Animal and the Machine", 1948)

"The definition of a ‘good model’ is when everything inside it is visible, inspectable and testable. It can be communicated effortlessly to others. A ‘bad model’ is a model that does not meet these standards, where parts are hidden, undefined or concealed and it cannot be inspected or tested; these are often labelled black box models." (Hördur V Haraldsson & Harald U Sverdrup, "Finding Simplicity in Complexity in Biogeochemical Modelling" [in "Environmental Modelling: Finding Simplicity in Complexity", Ed. by John Wainwright and Mark Mulligan, 2004])

"Operational thinking is about mapping relationships. It is about capturing interactions, interconnections, the sequence and flow of activities, and the rules of the game. It is about how systems do what they do, or the dynamic process of using elements of the structure to produce the desired functions. In a nutshell, it is about unlocking the black box that lies between system input and system output." (Jamshid Gharajedaghi, "Systems Thinking: Managing Chaos and Complexity A Platform for Designing Business Architecture" 3rd Ed., 2011)

"The transparency of Bayesian networks distinguishes them from most other approaches to machine learning, which tend to produce inscrutable 'black boxes'. In a Bayesian network you can follow every step and understand how and why each piece of evidence changed the network’s beliefs." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Deep neural networks have an input layer and an output layer. In between, are “hidden layers” that process the input data by adjusting various weights in order to make the output correspond closely to what is being predicted. [...] The mysterious part is not the fancy words, but that no one truly understands how the pattern recognition inside those hidden layers works. That’s why they’re called 'hidden'. They are an inscrutable black box - which is okay if you believe that computers are smarter than humans, but troubling otherwise." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The concept of integrated information is clearest when applied to networks. Imagine a black box with input and output terminals. Inside are some electronics, such as a network with logic elements (AND, OR, and so on) wired together. Viewed from the outside, it will usually not be possible to deduce the circuit layout simply by examining the cause–effect relationship between inputs and outputs, because functionally equivalent black boxes can be built from very different circuits. But if the box is opened, it’s a different story. Suppose you use a pair of cutters to sever some wires in the network. Now rerun the system with all manner of inputs. If a few snips dramatically alter the outputs, the circuit can be described as highly integrated, whereas in a circuit with low integration the effect of some snips may make no difference at all." (Paul Davies, "The Demon in the Machine: How Hidden Webs of Information Are Solving the Mystery of Life", 2019)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)