SQL Troubles

13 April 2006

🖍️Phillip I Good - Collected Quotes

"A major problem with many studies is that the population of interest is not adequately defined before the sample is drawn. Don’t make this mistake. A second major source of error is that the sample proves to have been drawn from a different population than was originally envisioned." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A permutation test based on the original observations is appropriate only if one can assume that under the null hypothesis the observations are identically distributed in each of the populations from which the samples are drawn. If we cannot make this assumption, we will need to transform the observations, throwing away some of the information about them so that the distributions of the transformed observations are identical." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A well-formulated hypothesis will be both quantifiable and testable - that is, involve measurable quantities or refer to items that may be assigned to mutually exclusive categories. [...] When the objective of our investigations is to arrive at some sort of conclusion, then we need to have not only a hypothesis in mind, but also one or more potential alternative hypotheses." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Before we initiate data collection, we must have a firm idea of what we will measure. A second fundamental principle is also applicable to both experiments and surveys: Collect exact values whenever possible. Worry about grouping them in interval or discrete categories later. […] You can always group your results (and modify your groupings) after a study is completed. If after-the-fact grouping is a possibility, your design should state how the grouping will be determined; otherwise there will be the suspicion that you chose the grouping to obtain desired results." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Estimation methods should be impartial. Decisions should not depend on the accidental and quite irrelevant labeling of the samples. Nor should decisions depend on the units in which the measurements are made." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Every statistical procedure relies on certain assumptions for correctness. Errors in testing hypotheses come about either because the assumptions underlying the chosen test are not satisfied or because the chosen test is less powerful than other competing procedures."(Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"[…] finding at least one cluster of events in time or in spaceh as a greater probability than finding no clusters at all (equally spaced events)." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"More important than comparing the means of populations can be determining why the variances are different." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Most statistical procedures rely on two fundamental assumptions: that the observations are independent of one another and that they are identically distributed. If your methods of collection fail to honor these assumptions, then your analysis must fail also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Never assign probabilities to the true state of nature, but only to the validity of your own predictions." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The greatest error associated with the use of statistical procedures is to make the assumption that one single statistical methodology can suffice for all applications. […] But one methodology can never be better than another, nor can estimation replace hypothesis testing or vice versa. Every methodology has a proper domain of application and another set of applications for which it fails. Every methodology has its drawbacks and its advantages, its assumptions, and its sources of error." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The sources of error in applying statistical procedures are legion and include all of the following: (•) Using the same set of data both to formulate hypotheses and to test them. (•) Taking samples from the wrong population or failing to specify the population(s) about which inferences are to be made in advance. (•) Failing to draw random, representative samples. (•) Measuring the wrong variables or failing to measure what you’d hoped to measure. (•) Using inappropriate or inefficient statistical methods. (•) Failing to validate models. But perhaps the most serious source of error lies in letting statistical procedures make decisions for you." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in estimation stem from a failure to measure what one wanted to measure or what one thought one was measuring. Misleading definitions, inaccurate measurements, errors in recording and transcription, and confounding variables plague results. To forestall such errors, review your data collection protocols and procedure manuals before you begin, run several preliminary trials, record potential confounding variables, monitor data collection, and review the data as they are collected." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in Statistics - and not incidentally, in most human endeavors - arise from a reluctance (or even an inability) to plan. Some demon (or demonic manager) seems to be urging us to cross the street before we’ve had the opportunity to look both ways. Even on those rare occasions when we do design an experiment, we seem more obsessed with the mechanics than with the concepts that underlie it." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Use statistics as a guide to decision making rather than a mandate." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"When we assert that for a given population a percentage of samples will have a specific composition, this also is a deduction. But when we make an inductive generalization about a population based upon our analysis of a sample, we are on shakier ground. It is one thing to assert that if an observation comes from a normal distribution with mean zero, the probability is one-half that it is positive. It is quite another if, on observing that half the observations in the sample are positive, we assert that half of all the possible observations that might be drawn from that population will be positive also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"While a null hypothesis can facilitate statistical inquiry - an exact permutation test is impossible without it - it is never mandated. In any event, virtually any quantifiable hypothesis can be converted into null form. There is no excuse and no need to be content with a meaningless null. […] We must specify our alternatives before we commence an analysis, preferably at the same time we design our study." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

🖍️Alex Reinhart - Collected Quotes

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’. Just try several different analyses offered by your statistical software until one of them turns up an interesting result, and then pretend this is the analysis you intended to do all along. Without psychic powers, it’s almost impossible to tell when a published result was obtained through data torture." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In science, it is important to limit two kinds of errors: false positives, where you conclude there is an effect when there isn’t, and false negatives, where you fail to notice a real effect. In some sense, false positives and false negatives are flip sides of the same coin. If we’re too ready to jump to conclusions about effects, we’re prone to get false positives; if we’re too conservative, we’ll err on the side of false negatives." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In exploratory data analysis, you don’t choose a hypothesis to test in advance. You collect data and poke it to see what interesting details might pop out, ideally leading to new hypotheses and new experiments. This process involves making numerous plots, trying a few statistical analyses, and following any promising leads. But aimlessly exploring data means a lot of opportunities for false positives and truth inflation." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In short, statistical significance does not mean your result has any practical significance. As for statistical insignificance, it doesn’t tell you much. A statistically insignificant difference could be nothing but noise, or it could represent a real effect that can be pinned down only with more data." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"More useful than a statement that an experiment’s results were statistically insignificant is a confidence interval giving plausible sizes for the effect. Even if the confidence interval includes zero, its width tells you a lot: a narrow interval covering zero tells you that the effect is most likely small (which may be all you need to know, if a small effect is not practically useful), while a wide interval clearly shows that the measurement was not precise enough to draw conclusions." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Much of experimental science comes down to measuring differences. [...] We use statistics to make judgments about these kinds of differences. We will always observe some difference due to luck and random variation, so statisticians talk about statistically significant differences when the difference is larger than could easily be produced by luck. So first we must learn how to make that decision." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Overlapping confidence intervals do not mean two values are not significantly different. Checking confidence intervals or standard errors will mislead. It’s always best to use the appropriate hypothesis test instead. Your eyeball is not a well-defined statistical procedure." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed. [...] Remember, a p value is not a measure of how right you are or how important a difference is. Instead, think of it as a measure of surprise." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"There is exactly one situation when visually checking confidence intervals works, and it is when comparing the confidence interval against a fixed value, rather than another confidence interval. If you want to know whether a number is plausibly zero, you may check to see whether its confidence interval overlaps with zero. There are, of course, formal statistical procedures that generate confidence intervals that can be compared by eye and that even correct for multiple comparisons automatically. Unfortunately, these procedures work only in certain circumstances;" (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"When statisticians are asked for an interesting paradoxical result in statistics, they often turn to Simpson’s paradox. Simpson’s paradox arises whenever an apparent trend in data, caused by a confounding variable, can be eliminated or reversed by splitting the data into natural groups." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

12 April 2006

🖍️Kaiser Fung - Collected Quotes

"Numbers already rule your world. And you must not be in the dark about this fact. See how some applied scientists use statistical thinking to make our lives better. You will be amazed how you can use numbers to make everyday decisions in your own life." (Kaiser Fung, "Numbers Rule the World", 2010)

"The issue of group differences is fundamental to statistical thinking. The heart of this matter concerns which groups should be aggregated and which shouldn’t." (Kaiser Fung, "Numbers Rule the World", 2010)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Measuring anything subjective always prompts perverse behavior. [...] All measurement systems are subject to abuse." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"No subjective metric can escape strategic gaming [...] The possibility of mischief is bottomless. Fighting ratings is fruitless, as they satisfy a very human need. If one scheme is beaten down, another will take its place and wear its flaws. Big Data just deepens the danger. The more complex the rating formulas, the more numerous the opportunities there are to dress up the numbers. The larger the data sets, the harder it is to audit them." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"NUMBERSENSE is not taking numbers at face value. NUMBERSENSE is the ability to relate numbers here to numbers there, to separate the credible from the chimerical. It means drawing the dividing line between science hour and story time." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statistical models in the social sciences rely on correlations, generally not causes, of our behavior. It is inevitable that such models of reality do not capture reality well. This explains the excess of false positives and false negatives." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statistically speaking, the best predictive models are gems." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statisticians set a high bar when they assign a cause to an effect. [...] A model that ignores cause–effect relationships cannot attain the status of a model in the physical sciences. This is a structural limitation that no amount of data - not even Big Data - can surmount." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Usually, it is impossible to restate past data. As a result, all history must be whitewashed and measurement starts from scratch." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

🖍️Bart Kosko - Collected Quotes

"A bell curve shows the 'spread' or variance in our knowledge or certainty. The wider the bell the less we know. An infinitely wide bell is a flat line. Then we know nothing. The value of the quantity, position, or speed could lie anywhere on the axis. An infinitely narrow bell is a spike that is infinitely tall. Then we have complete knowledge of the value of the quantity. The uncertainty principle says that as one bell curve gets wider the other gets thinner. As one curve peaks the other spreads. So if the position bell curve becomes a spike and we have total knowledge of position, then the speed bell curve goes flat and we have total uncertainty (infinite variance) of speed." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Bivalence trades accuracy for simplicity. Binary outcomes of yes and no, white and black, true and false simplify math and computer processing. You can work with strings of 0s and 1s more easily than you can work with fractions. But bivalence requires some force fitting and rounding off [...] Bivalence holds at cube corners. Multivalence holds everywhere else." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Fuzziness has a formal name in science: multivalence. The opposite of fuzziness is bivalence or two-valuedness, two ways to answer each question, true or false, 1 or 0. Fuzziness means multivalence. It means three or more options, perhaps an infinite spectrum of options, instead of just two extremes. It means analog instead of binary, infinite shades of gray between black and white." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"The binary logic of modern computers often falls short when describing the vagueness of the real world. Fuzzy logic offers more graceful alternatives." (Bart Kosko & Satoru Isaka, "Fuzzy Logic,” Scientific American Vol. 269, 1993)

"A bit involves both probability and an experiment that decides a binary or yes-no question. Consider flipping a coin. One bit of in-formation is what we learn from the flip of a fair coin. With an unfair or biased coin the odds are other than even because either heads or tails is more likely to appear after the flip. We learn less from flipping the biased coin because there is less surprise in the outcome on average. Shannon's bit-based concept of entropy is just the average information of the experiment. What we gain in information from the coin flip we lose in uncertainty or entropy." (Bart Kosko, "Noise", 2006)

"A signal has a finite-length frequency spectrum only if it lasts infinitely long in time. So a finite spectrum implies infinite time and vice versa. The reverse also holds in the ideal world of mathematics: A signal is finite in time only if it has a frequency spectrum that is infinite in extent." (Bart Kosko, "Noise", 2006)

"Bell curves don't differ that much in their bells. They differ in their tails. The tails describe how frequently rare events occur. They describe whether rare events really are so rare. This leads to the saying that the devil is in the tails." (Bart Kosko, "Noise", 2006)

"Chaos can leave statistical footprints that look like noise. This can arise from simple systems that are deterministic and not random. [...] The surprising mathematical fact is that most systems are chaotic. Change the starting value ever so slightly and soon the system wanders off on a new chaotic path no matter how close the starting point of the new path was to the starting point of the old path. Mathematicians call this sensitivity to initial conditions but many scientists just call it the butterfly effect. And what holds in math seems to hold in the real world - more and more systems appear to be chaotic." (Bart Kosko, "Noise", 2006)

"'Chaos' refers to systems that are very sensitive to small changes in their inputs. A minuscule change in a chaotic communication system can flip a 0 to a 1 or vice versa. This is the so-called butterfly effect: Small changes in the input of a chaotic system can produce large changes in the output. Suppose a butterfly flaps its wings in a slightly different way. can change its flight path. The change in flight path can in time change how a swarm of butterflies migrates." (Bart Kosko, "Noise", 2006)

"I wage war on noise every day as part of my work as a scientist and engineer. We try to maximize signal-to-noise ratios. We try to filter noise out of measurements of sounds or images or anything else that conveys information from the world around us. We code the transmission of digital messages with extra 0s and 1s to defeat line noise and burst noise and any other form of interference. We design sophisticated algorithms to track noise and then cancel it in headphones or in a sonogram. Some of us even teach classes on how to defeat this nemesis of the digital age. Such action further conditions our anti-noise reflexes." (Bart Kosko, "Noise", 2006)

"Linear systems do not benefit from noise because the output of a linear system is just a simple scaled version of the input [...] Put noise in a linear system and you get out noise. Sometimes you get out a lot more noise than you put in. This can produce explosive effects in feedback systems that take their own outputs as inputs." (Bart Kosko, "Noise", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"Noise is a signal we don't like. Noise has two parts. The first has to do with the head and the second with the heart. The first part is the scientific or objective part: Noise is a signal. [...] The second part of noise is the subjective part: It deals with values. It deals with how we draw the fuzzy line between good signals and bad signals. Noise signals are the bad signals. They are the unwanted signals that mask or corrupt our preferred signals. They not only interfere but they tend to interfere at random." (Bart Kosko, "Noise", 2006)

"Noise is an unwanted signal. A signal is anything that conveys information or ultimately anything that has energy. The universe consists of a great deal of energy. Indeed a working definition of the universe is all energy anywhere ever. So the answer turns on how one defines what it means to be wanted and by whom." (Bart Kosko, "Noise", 2006)

"The central limit theorem differs from laws of large numbers because random variables vary and so they differ from constants such as population means. The central limit theorem says that certain independent random effects converge not to a constant population value such as the mean rate of unemployment but rather they converge to a random variable that has its own Gaussian bell-curve description." (Bart Kosko, "Noise", 2006)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves." (Bart Kosko, "Noise", 2006)

More quotes from Bart Kosko at QuotableMath.blogspot.com.

🖍️Tim Harford - Collected Quotes

"An algorithm, meanwhile, is a step-by-step recipe for performing a series of actions, and in most cases 'algorithm' means simply 'computer program'." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Each decision about what data to gather and how to analyze them is akin to standing on a pathway as it forks left and right and deciding which way to go. What seems like a few simple choices can quickly multiply into a labyrinth of different possibilities. Make one combination of choices and you’ll reach one conclusion; make another, equally reasonable, and you might find a very different pattern in the data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Each of us is sweating data, and those data are being mopped up and wrung out into oceans of information. Algorithms and large datasets are being used for everything from finding us love to deciding whether, if we are accused of a crime, we go to prison before the trial or are instead allowed to post bail. We all need to understand what these data are and how they can be exploited." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Good statistics are not a trick, although they are a kind of magic. Good statistics are not smoke and mirrors; in fact, they help us see more clearly. Good statistics are like a telescope for an astronomer, a microscope for a bacteriologist, or an X-ray for a radiologist. If we are willing to let them, good statistics help us see things about the world around us and about ourselves - both large and small - that we would not be able to see in any other way." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"If we don’t understand the statistics, we’re likely to be badly mistaken about the way the world is. It is all too easy to convince ourselves that whatever we’ve seen with our own eyes is the whole truth; it isn’t. Understanding causation is tough even with good statistics, but hopeless without them. [...] And yet, if we understand only the statistics, we understand little. We need to be curious about the world that we see, hear, touch, and smell, as well as the world we can examine through a spreadsheet." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"[…] in a world where so many people seem to hold extreme views with strident certainty, you can deflate somebody’s overconfidence and moderate their politics simply by asking them to explain the details." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"It’d be nice to fondly imagine that high-quality statistics simply appear in a spreadsheet somewhere, divine providence from the numerical heavens. Yet any dataset begins with somebody deciding to collect the numbers. What numbers are and aren’t collected, what is and isn’t measured, and who is included or excluded are the result of all-too-human assumptions, preconceptions, and oversights." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Making big data work is harder than it seems. Statisticians have spent the past two hundred years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster, and cheaper these days, but we must not pretend that the traps have all been made safe. They have not." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis. All too often, this is much harder than it should be. […] So the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Much of the data visualization that bombards us today is decoration at best, and distraction or even disinformation at worst. The decorative function is surprisingly common, perhaps because the data visualization teams of many media organizations are part of the art departments. They are led by people whose skills and experience are not in statistics but in illustration or graphic design. The emphasis is on the visualization, not on the data. It is, above all, a picture." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Numbers can easily confuse us when they are unmoored from a clear definition." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Premature enumeration is an equal-opportunity blunder: the most numerate among us may be just as much at risk as those who find their heads spinning at the first mention of a fraction. Indeed, if you’re confident with numbers you may be more prone than most to slicing and dicing, correlating and regressing, normalizing and rebasing, effortlessly manipulating the numbers on the spreadsheet or in the statistical package - without ever realizing that you don’t fully understand what these abstract quantities refer to. Arguably this temptation lay at the root of the last financial crisis: the sophistication of mathematical risk models obscured the question of how, exactly, risks were being measured, and whether those measurements were something you’d really want to bet your global banking system on." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The 'margin of error' reported in opinion polls reflects this risk, and the larger the sample, the smaller the margin of error. […] sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statistical metrics can show us facts and trends that would be impossible to see in any other way, but often they’re used as a substitute for relevant experience, by managers or politicians without specific expertise or a close-up view." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statisticians are sometimes dismissed as bean counters. The sneering term is misleading as well as unfair. Most of the concepts that matter in policy are not like beans; they are not merely difficult to count, but difficult to define. Once you’re sure what you mean by 'bean', the bean counting itself may come more easily. But if we don’t understand the definition, then there is little point in looking at the numbers. We have fooled ourselves before we have begun."(Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"So information is beautiful - but misinformation can be beautiful, too. And producing beautiful misinformation is becoming easier than ever." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The contradiction between what we see with our own eyes and what the statistics claim can be very real. […] The truth is more complicated. Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience - and in those cases, which to believe." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The world is full of patterns that are too subtle or too rare to detect by eyeballing them, and a pattern doesn’t need to be very subtle or rare to be hard to spot without a statistical lens." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The whole discipline of statistics is built on measuring or counting things. […] it is important to understand what is being measured or counted, and how. It is surprising how rarely we do this. Over the years, as I found myself trying to lead people out of statistical mazes week after week, I came to realize that many of the problems I encountered were because people had taken a wrong turn right at the start. They had dived into the mathematics of a statistical claim - asking about sampling errors and margins of error, debating if the number is rising or falling, believing, doubting, analyzing, dissecting - without taking the ti- me to understand the first and most obvious fact: What is being measured, or counted? What definition is being used?" (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Those of us in the business of communicating ideas need to go beyond the fact-check and the statistical smackdown. Facts are valuable things, and so is fact-checking. But if we really want people to understand complex issues, we need to engage their curiosity. If people are curious, they will learn." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious […]. Other omissions show up only when we take a close look at the claim in question." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We don’t need to become emotionless processors of numerical information - just noticing our emotions and taking them into account may often be enough to improve our judgment. Rather than requiring superhuman control over our emotions, we need simply to develop good habits." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We filter new information. If it accords with what we expect, we’ll be more likely to accept it. […] Our brains are always trying to make sense of the world around us based on incomplete information. The brain makes predictions about what it expects, and tends to fill in the gaps, often based on surprisingly sparse data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We should conclude nothing because that pair of numbers alone tells us very little. If we want to understand what’s happening, we need to step back and take in a broader perspective." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"[…] when it comes to interpreting the world around us, we need to realize that our feelings can trump our expertise. […] The more extreme the emotional reaction, the harder it is to think straight. […] It is not easy to master our emotions while assessing information that matters to us, not least because our emotions can lead us astray in different directions. […] We often find ways to dismiss evidence that we don’t like. And the opposite is true, too: when evidence seems to support our preconceptions, we are less likely to look too closely for flaws." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"When we are trying to understand a statistical claim - any statistical claim - we need to start by asking ourselves what the claim actually means. [...] A surprising statistical claim is a challenge to our existing worldview. It may provoke an emotional response - even a fearful one." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

🖍️Nikola K Kasabov - Collected Quotes

"A strategy is usually expressed by a set of heuristic rules. The heuristic rules ease the process of searching for an optimal solution. The process is usually iterative and at one step either the global optimum for the whole problem (state) space is found and the process stops, or a local optimum for a subspace of the state space of the problem is found and the problem continues, if it is possible to improve." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Adaptation is the process of changing a system during its operation in a dynamically changing environment. Learning and interaction are elements of this process. Without adaptation there is no intelligence." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"An artificial neural network (or simply a neural network) is a biologically inspired computational model that consists of processing elements (neurons) and connections between them, as well as of training and recall algorithms." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Artificial intelligence comprises methods, tools, and systems for solving problems that normally require the intelligence of humans. The term intelligence is always defined as the ability to learn effectively, to react adaptively, to make proper decisions, to communicate in language or images in a sophisticated way, and to understand." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Data obtained without any external disturbance or corruption are called clean; noisy data mean that a small random ingredient is added to the clean data." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Fuzzy systems are excellent tools for representing heuristic, commonsense rules. Fuzzy inference methods apply these rules to data and infer a solution. Neural networks are very efficient at learning heuristics from data. They are 'good problem solvers' when past data are available. Both fuzzy systems and neural networks are universal approximators in a sense, that is, for a given continuous objective function there will be a fuzzy system and a neural network which approximate it to any degree of accuracy." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Fuzzy systems are rule-based expert systems based on fuzzy rules and fuzzy inference. Fuzzy rules represent in a straightforward way 'commonsense' knowledge and skills, or knowledge that is subjective, ambiguous, vague, or contradictory." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Generalization is the process of matching new, unknown input data with the problem knowledge in order to obtain the best possible solution, or one close to it. Generalization means reacting properly to new situations, for example, recognizing new images, or classifying new objects and situations. Generalization can also be described as a transition from a particular object description to a general concept description. This is a major characteristic of all intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Generally speaking, problem knowledge for solving a given problem may consist of heuristic rules or formulas that comprise the explicit knowledge, and past-experience data that comprise the implicit, hidden knowledge. Knowledge represents links between the domain space and the solution space, the space of the independent variables and the space of the dependent variables." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Heuristic (it is of Greek origin) means discovery. Heuristic methods are based on experience, rational ideas, and rules of thumb. Heuristics are based more on common sense than on mathematics. Heuristics are useful, for example, when the optimal solution needs an exhaustive search that is not realistic in terms of time. In principle, a heuristic does not guarantee the best solution, but a heuristic solution can provide a tremendous shortcut in cost and time." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Heuristic methods may aim at local optimization rather than at global optimization, that is, the algorithm optimizes the solution stepwise, finding the best solution at each small step of the solution process and 'hoping' that the global solution, which comprises the local ones, would be satisfactory." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Inference is the process of matching current facts from the domain space to the existing knowledge and inferring new facts. An inference process is a chain of matchings. The intermediate results obtained during the inference process are matched against the existing knowledge. The length of the chain is different. It depends on the knowledge base and on the inference method applied." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Learning is the process of obtaining new knowledge. It results in a better reaction to the same inputs at the next session of operation. It means improvement. It is a step toward adaptation. Learning is a major characteristic of intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Prediction (forecasting) is the process of generating information for the possible future development of a process from data about its past and its present development." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Representation is the process of transforming existing problem knowledge to some of the known knowledge-engineering schemes in order to process it by applying knowledge-engineering methods. The result of the representation process is the problem knowledge base in a computer format." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The most distinguishing property of fuzzy logic is that it deals with fuzzy propositions, that is, propositions which contain fuzzy variables and fuzzy values, for example, 'the temperature is high', 'the height is short'. The truth values for fuzzy propositions are not TRUE/FALSE only, as is the case in propositional boolean logic, but include all the grayness between two extreme values." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Validation is the process of testing how good the solutions produced by a system are. The results produced by a system are usually compared with the results obtained either by experts or by other systems. Validation is an extremely important part of the process of developing every knowledge-based system. Without comparing the results produced by the system with reality, there is little point in using it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

11 April 2006

🖍️Matthew Kirk - Collected Quotes

"A good proxy for complexity in a machine learning model is how fast it takes to train it." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Cross-validation is a method of splitting all of your data into two parts: training and validation. The training data is used to build the machine learning model, whereas the validation data is used to validate that the model is doing what is expected. This increases our ability to find and determine the underlying errors in a model." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"In statistics, there is a measure called power that denotes the probability of not finding a false negative. As power goes up, false negatives go down. However, what influences this measure is the sample size. If our sample size is too small, we just don’t have enough information to come up with a good solution." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is the intersection between theoretically sound computer science and practically noisy data. Essentially, it’s about machines making sense out of data in much the same way that humans do." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is well suited for the unpredictable future, because most algorithms learn from new information. But as new information is found, it can also come in unstable forms, and new issues can arise that weren’t thought of before. We don’t know what we don’t know. When processing new information, it’s sometimes hard to tell whether our model is working." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Precision and recall are ways of monitoring the power of the machine learning implementation. Precision is a metric that monitors the percentage of true positives. […] Recall is the ratio of true positives to true positive plus false negatives." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Supervised learning, or function approximation, is simply fitting data to a function of any variety. […] Unsupervised learning involves figuring out what makes the data special. […] Reinforcement learning involves figuring out how to play a multistage game with rewards and payoffs. Think of it as the algorithms that optimize the life of something." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Underfitting is when a model doesn’t take into account enough information to accurately model real life. For example, if we observed only two points on an exponential curve, we would probably assert that there is a linear relationship there. But there may not be a pattern, because there are only two points to reference. [...] It seems that the best way to mitigate underfitting a model is to give it more information, but this actually can be a problem as well. More data can mean more noise and more problems. Using too much data and too complex of a model will yield something that works for that particular data set and nothing else." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

🖍️DeWayne R Derryberry - Collected Quotes

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"A key difference between a traditional statistical problems and a time series problem is that often, in time series, the errors are not independent." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"A stationary time series is one that has had trend elements (the signal) removed and that has a time invariant pattern in the random noise. In other words, although there is a pattern of serial correlation in the noise, that pattern seems to mimic a fixed mathematical model so that the same model fits any arbitrary, contiguous subset of the noise." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply)." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Both real and simulated data are very important for data analysis. Simulated data is useful because it is known what process generated the data. Hence it is known what the estimated signal and noise should look like (simulated data actually has a well-defined signal and well-defined noise). In this setting, it is possible to know, in a concrete manner, how well the modeling process has worked." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"Either a logarithmic or a square-root transformation of the data would produce a new series more amenable to fit a simple trigonometric model. It is often the case that periodic time series have rounded minima and sharp-peaked maxima. In these cases, the square root or logarithmic transformation seems to work well most of the time." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […]." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"If the observations/errors are not independent, the statistical formulations are completely unreliable unless corrections can be made." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Not all data sets lend themselves to data splitting. The data set may be too small to split and/or the fitted model may be a local smoother. In the first case, there is too little data upon which to build a model if the data is split; and in the second case, it is not expected the model for any part of the data to directly interpolate/extrapolate to any other part of the model. For these cases, a different approach to cross-validation is possible, something similar to bootstrapping." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"Once a model has been fitted to the data, the deviations from the model are the residuals. If the model is appropriate, then the residuals mimic the true errors. Examination of the residuals often provides clues about departures from the modeling assumptions. Lack of fit - if there is curvature in the residuals, plotted versus the fitted values, this suggests there may be whole regions where the model overestimates the data and other whole regions where the model underestimates the data. This would suggest that the current model is too simple relative to some better model." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Prediction about the future assumes that the statistical model will continue to fit future data. There are several reasons this is often implausible, but it also seems clear that the model will often degenerate slowly in quality, so that the model will fit data only a few periods in the future almost as well as the data used to fit the model. To some degree, the reliability of extrapolation into the future involves subject-matter expertise." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"[The normality] assumption is the least important one for the reliability of the statistical procedures under discussion. Violations of the normality assumption can be divided into two general forms: Distributions that have heavier tails than the normal and distributions that are skewed rather than symmetric. If data is skewed, the formulas we are discussing are still valid as long as the sample size is sufficiently large. Although the guidance about 'how skewed' and 'how large a sample' can be quite vague, since the greater the skew, the larger the required sample size. For the data commonly used in time series and for the sample sizes (which are generally quite large) used, skew is not a problem. On the other hand, heavy tails can be very problematic." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Transformations of data alter statistics. For example, the mean of a data set can be found, but it is not easy to relate the mean of a data set to the mean of the logarithm of that data set. The median is far friendlier to transformations. If the median of a data set is found, then the logarithm of the data set is analyzed; the median of the log transformed data will be the log of the original median." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"When data is not normal, the reason the formulas are working is usually the central limit theorem. For large sample sizes, the formulas are producing parameter estimates that are approximately normal even when the data is not itself normal. The central limit theorem does make some assumptions and one is that the mean and variance of the population exist. Outliers in the data are evidence that these assumptions may not be true. Persistent outliers in the data, ones that are not errors and cannot be otherwise explained, suggest that the usual procedures based on the central limit theorem are not applicable." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Whenever the data is periodic, at some level, there are only as many observations as the number of complete periods. This global feature of the data suggests caution in understanding more detailed features of the data. While a curvature model might be appropriate for this data, there is too little data to know this, and some skepticism might be in order if such a model were fitted to the data." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

🖍️Erik J Larson - Collected Quotes

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"First, intelligence is situational - there is no such thing as general intelligence. Your brain is one piece in a broader system which includes your body, your environment, other humans, and culture as a whole. Second, it is contextual - far from existing in a vacuum, any individual intelligence will always be both defined and limited by its environment. (And currently, the environment, not the brain, is acting as the bottleneck to intelligence.) Third, human intelligence is largely externalized, contained not in your brain but in your civilization. Think of individuals as tools, whose brains are modules in a cognitive system much larger than themselves - a system that is self-improving and has been for a long time." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Inference is to bring about a new thought, which in logic amounts to drawing a conclusion, and more generally involves using what we already know, and what we see or observe, to update prior beliefs. […] Inference is also a leap of sorts, deemed reasonable […] Inference is a basic cognitive act for intelligent minds. If a cognitive agent (a person, an AI system) is not intelligent, it will infer badly. But any system that infers at all must have some basic intelligence, because the very act of using what is known and what is observed to update beliefs is inescapably tied up with what we mean by intelligence. If an AI system is not inferring at all, it doesn’t really deserve to be called AI." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"People who assume that extensions of modern machine learning methods like deep learning will somehow 'train up', or learn to be intelligent like humans, do not understand the fundamental limitations that are already known. Admitting the necessity of supplying a bias to learning systems is tantamount to Turing’s observing that insights about mathematics must be supplied by human minds from outside formal methods, since machine learning bias is determined, prior to learning, by human designers." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"[...] the focus on Big Data AI seems to be an excuse to put forth a number of vague and hand-waving theories, where the actual details and the ultimate success of neuroscience is handed over to quasi- mythological claims about the powers of large datasets and inductive computation. Where humans fail to illuminate a complicated domain with testable theory, machine learning and big data supposedly can step in and render traditional concerns about finding robust theories. This seems to be the logic of Data Brain efforts today." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"The idea that we can predict the arrival of AI typically sneaks in a premise, to varying degrees acknowledged, that successes on narrow AI systems like playing games will scale up to general intelligence, and so the predictive line from artificial intelligence to artificial general intelligence can be drawn with some confidence. This is a bad assumption, both for encouraging progress in the field toward artificial general intelligence, and for the logic of the argument for prediction." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"The problem-solving view of intelligence helps explain the production of invariably narrow applications of AI throughout its history. Game playing, for instance, has been a source of constant inspiration for the development of advanced AI techniques, but games are simplifications of life that reward simplified views of intelligence. […] Treating intelligence as problem-solving thus gives us narrow applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"To accomplish their goals, what are now called machine learning systems must each learn something specific. Researchers call this giving the machine a 'bias'. […] A bias in machine learning means that the system is designed and tuned to learn something. But this is, of course, just the problem of producing narrow problem-solving applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

10 April 2006

🖍️Steven S Skiena - Collected Quotes

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Exploratory data analysis is the search for patterns and trends in a given data set. Visualization techniques play an important part in this quest. Looking carefully at your data is important for several reasons, including identifying mistakes in collection/processing, finding violations of statistical assumptions, and suggesting interesting hypotheses." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The advent of massive data sets is changing in the way science is done. The traditional scientific method is hypothesis driven. The researcher formulates a theory of how the world works, and then seeks to support or reject this hypothesis based on data. By contrast, data-driven science starts by assembling a substantial data set, and then hunts for patterns that ideally will play the role of hypotheses for future analysis." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The danger of overfitting is particularly severe when the training data is not a perfect gold standard. Human class annotations are often subjective and inconsistent, leading boosting to amplify the noise at the expense of the signal. The best boosting algorithms will deal with overfitting though regularization. The goal will be to minimize the number of non-zero coefficients, and avoid large coefficients that place too much faith in any one classifier in the ensemble." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Using noise (the uncorrelated variables) to fit noise (the residual left from a simple model on the genuinely correlated variables) is asking for trouble." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variables which follow symmetric, bell-shaped distributions tend to be nice as features in models. They show substantial variation, so they can be used to discriminate between things, but not over such a wide range that outliers are overwhelming." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

🖍️Arthur L Bowley - Collected Quotes

"A knowledge of statistics is like a knowledge of foreign languages or of algebra; it may prove of use at any time under any circumstances." (Arthur L Bowley, "Elements of Statistics", 1901)

"A statistical estimate may be good or bad, accurate or the reverse; but in almost all cases it is likely to be more accurate than a casual observer’s impression, and the nature of things can only be disproved by statistical methods." (Arthur L Bowley, "Elements of Statistics", 1901)

"Great numbers are not counted correctly to a unit, they are estimated; and we might perhaps point to this as a division between arithmetic and statistics, that whereas arithmetic attains exactness, statistics deals with estimates, sometimes very accurate, and very often sufficiently so for their purpose, but never mathematically exact." (Arthur L Bowley, "Elements of Statistics", 1901)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Arthur L Bowley, "Elements of Statistics", 1901)

"[…] statistics is the science of the measurement of the social organism, regarded as a whole, in all its manifestations." (Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may rightly be called the science of averages. […] Great numbers and the averages resulting from them, such as we always obtain in measuring social phenomena, have great inertia. […] It is this constancy of great numbers that makes statistical measurement possible. It is to great numbers that statistical measurement chiefly applies." (Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may, for instance, be called the science of counting. Counting appears at first sight to be a very simple operation, which any one can perform or which can be done automatically; but, as a matter of fact, when we come to large numbers, e.g., the population of the United Kingdom, counting is by no means easy, or within the power of an individual; limits of time and place alone prevent it being so carried out, and in no way can absolute accuracy be obtained when the numbers surpass certain limits." (Arthur L Bowley, "Elements of Statistics", 1901)

"By [diagrams] it is possible to present at a glance all the facts which could be obtained from figures as to the increase, fluctuations, and relative importance of prices, quantities, and values of different classes of goods and trade with various countries; while the sharp irregularities of the curves give emphasis to the disturbing causes which produce any striking change." (Arthur L Bowley, "A Short Account of England's Foreign Trade in the Nineteenth Century, its Economic and Social Results", 1905)

"Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness." (Arthur L Bowley, "The Nature and Purpose of the Measurement of Social Phenomena", 1915)

"[...] the problems of the errors that arise in the process of sampling have been chiefly discussed from the point of view of the universe, not of the sample; that is, the question has been how far will a sample represent a given universe? The practical question is, however, the converse: what can we infer about a universe from a given sample? This involves the difficult and elusive theory of inverse probability, for it may be put in the form, which of the various universes from which the sample may a priori have been drawn may be expected to have yielded that sample?" (Arthur L Bowley, "Elements of Statistics. 5th Ed., 1926)

"Statistics are numerical statements of facts in any department of inquiry, placed in relation to each other; statistical methods are devices for abbreviating and classifying the statements and making clear the relations." (Arthur L Bowley, "An Elementary Manual of Statistics", 1934)

🖍️Alan Turing - Collected Quotes

"A computer would deserve to be called intelligent if it could deceive a human into believing that it was human." (Alan Turing, "Computing Machinery and Intelligence", Mind Vol. 59, 1950)

"If one wants to make a machine mimic the behaviour of the human computer in some complex operation one has to ask him how it is done, and then translate the answer into the form of an instruction table. Constructing instruction tables is usually described as 'programming'." (Alan Turing, "Computing Machinery and Intelligence", Mind Vol. 59, 1950)

"It is unnecessary to design various new machines to do various computing processes. They can all be done with one digital computer, suitably programmed for each case." (Alan Turing, "Computing Machinery and Intelligence", Mind Vol. 59, 1950)

"The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer.” (Alan Turing, “Computing Machinery and Intelligence”, Mind Vol. 59, 1950)

"The original question, 'Can machines think?:, I believe too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted." (Alan M. Turing, 1950)

"The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false. A natural consequence of doing so is that one then assumes that there is no virtue in the mere working out of consequences from data and general principles." (Alan Turing, "Computing Machinery and Intelligence", Mind Vol. 59, 1950)

"This model will be a simplification and an idealization, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present state of knowledge." (Alan M Turing, "The Chemical Basis of Morphogenesis" , Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, Vol. 237 (641), 1952)

"Almost everyone now acknowledges that theory and experiment, model making, theory construction and linguistics all go together, and that the successful development of a science of behavior depends upon a ‘total approach’ in which, given that the computer ‘is the only large-scale universal model’ that we possess, ‘we may expect to follow the prescription of Simon and construct our models - or most of them - in the form of computer programs’." (Alan M Turing)

"Science is a differential equation. Religion is a boundary condition." (Alan M Turing)

"The whole thinking process is rather mysterious to us, but I believe that the attempt to make a thinking machine will help us greatly in finding out how we think ourselves." (Alan M Turing)

"We do not need to have an infinity of different machines doing different jobs. A single one will suffice. The engineering problem of producing various machines for various jobs is replaced by the office work of "programming" the universal machine to do these jobs." (Alan M Turing)

08 April 2006

🖍️John H Johnson - Collected Quotes

"A correlation is simply a bivariate relationship - a fancy way of saying that there is a relationship between two ('bi') variables ('variate'). And a bivariate relationship doesn’t prove that one thing caused the other. Think of it this way: you can observe that two things appear to be related statistically, but that doesn’t tell you the answer to any of the questions you might really care about - why is there a relationship and what does it mean to us as a consumer of data?" (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"A good chart can tell a story about the data, helping you understand relationships among data so you can make better decisions. The wrong chart can make a royal mess out of even the best data set." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Although some people use them interchangeably, probability and odds are not the same and people often misuse the terms. Probability is the likelihood that an outcome will occur. The odds of something happening, statistically speaking, is the ratio of favorable outcomes to unfavorable outcomes." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"[…] average isn’t something that should be considered in isolation. Your average is only as good as the data that supports it. If your sample isn’t representative of the full population, if you cherry- picked the data, or if there are other issues with your data, your average may be misleading." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Big data is sexy. It makes the headlines. […] But, as you’ve seen already, it’s the little data - the small bits and bytes of data that you’re bombarded with in your everyday life - that often has a huge effect on your health, your wallet, your job, your relationships, and so much more, every single day. From food labels to weather forecasts, your bank account to your doctor’s office, everydata is all around you." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Confirmation bias can affect nearly every aspect of the way you look at data, from sampling and observation to forecasting - so it’s something to keep in mind anytime you’re interpreting data. When it comes to correlation versus causation, confirmation bias is one reason that some people ignore omitted variables - because they’re making the jump from correlation to causation based on preconceptions, not the actual evidence." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Essentially, magnitude is the size of the effect. It’s a way to determine if the results are meaningful. Without magnitude, it’s hard to get a sense of how much something matters. […] the magnitude of an effect can change, depending on the relationship." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"First, you need to think about whether the universe of data that is being studied or collected is representative of the underlying population. […] Second, you need to consider what you are analyzing in the data that has been collected - are you analyzing all of the data, or only part of it? […] You always have to ask - can you accurately extend your findings from the sample to the general population? That’s called external validity - when you can extend the results from your sample to draw meaningful conclusions about the full population." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Forecasting is difficult because we don’t know everything about how the world works. There are unforeseen events. Unknown processes. Random occurrences. People are unpredictable, and things don’t always stay the same. The data you’re studying can change - as can your understanding of the underlying process." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Having a large sample size doesn’t guarantee better results if it’s the wrong large sample." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"If the underlying data isn’t sampled accurately, it’s like building a house on a foundation that’s missing a few chunks of concrete. Maybe it won’t matter. But if the missing concrete is in the wrong spot - or if there is too much concrete missing - the whole house can come falling down." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"If you’re looking at an average, you are - by definition - studying a specific sample set. If you’re comparing averages, and those averages come from different sample sets, the differences in the sample sets may well be manifested in the averages. Remember, an average is only as good as the underlying data." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"If your conclusions change dramatically by excluding a data point, then that data point is a strong candidate to be an outlier. In a good statistical model, you would expect that you can drop a data point without seeing a substantive difference in the results. It’s something to think about when looking for outliers." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"In the real world, statistical issues rarely exist in isolation. You’re going to come across cases where there’s more than one problem with the data. For example, just because you identify some sampling errors doesn’t mean there aren’t also issues with cherry picking and correlations and averages and forecasts - or simply more sampling issues, for that matter. Some cases may have no statistical issues, some may have dozens. But you need to keep your eyes open in order to spot them all." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Just as with aggregated data, an average is a summary statistic that can tell you something about the data - but it is only one metric, and oftentimes a deceiving one at that. By taking all of the data and boiling it down to one value, an average (and other summary statistics) may imply that all of the underlying data is the same, even when it’s not." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Keep in mind that a weighted average may be different than a simple (non- weighted) average because a weighted average - by definition - counts certain data points more heavily. When you’re thinking about an average, try to determine if it’s a simple average or a weighted average. If it’s weighted, ask yourself how it’s being weighted, and see which data points count more than others." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"[…] remember that, as with many statistical issues, sampling in and of itself is not a good or a bad thing. Sampling is a powerful tool that allows us to learn something, when looking at the full population is not feasible (or simply isn’t the preferred option). And you shouldn’t be misled to think that you always should use all the data. In fact, using a sample of data can be incredibly helpful." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Statistical significance refers to the probability that something is true. It’s a measure of how probable it is that the effect we’re seeing is real (rather than due to chance occurrence), which is why it’s typically measured with a p-value. P, in this case, stands for probability. If you accept p-values as a measure of statistical significance, then the lower your p-value is, the less likely it is that the results you’re seeing are due to chance alone." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"This idea of looking for answers is related to confirmation bias, which is the tendency to interpret data in a way that reinforces your preconceptions. With confirmation bias, you aren’t just looking for an answer - you’re looking for a specific answer." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The more uncertainty there is in your sample, the more uncertainty there will be in your forecast. A prediction is only as good as the information that goes into it, and in statistics, we call the basis for our forecasts a model. The model represents all the inputs - the factors you determine will predict the future outcomes, the underlying sample data you rely upon, and the relationship you apply mathematically. In other words, the model captures how you think various factors relate to one another." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The process of making statistical conclusions about the data is called drawing an inference. In any statistical analysis, if you’re going to draw an inference, the goal is to make sure you have the right data to answer the question you are asking." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The strength of an average is that it takes all the values in your data set and simplifies them down to a single number. This strength, however, is also the great danger of an average. If every data point is exactly the same (picture a row of identical bricks) then an average may, in fact, accurately reflect something about each one. But if your population isn’t similar along many key dimensions - and many data sets aren’t - then the average will likely obscure data points that are above or below the average, or parts of the data set that look different from the average. […] Another way that averages can mislead is that they typically only capture one aspect of the data." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The tricky part is that there aren’t really any hard- and- fast rules when it comes to identifying outliers. Some economists say an outlier is anything that’s a certain distance away from the mean, but in practice it’s fairly subjective and open to interpretation. That’s why statisticians spend so much time looking at data on a case-by-case basis to determine what is - and isn’t - an outlier." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Using a sample to estimate results in the full population is common in data analysis. But you have to be careful, because even small mistakes can quickly become big ones, given that each observation represents many others. There are also many factors you need to consider if you want to make sure your inferences are accurate." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

07 April 2006

🖍️Victor Cohn - Collected Quotes

"Different problems require different methods, different numbers. One of the most basic questions in science is: Is the study designed in a way that will allow the researchers to answer the questions that they want answered?" (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"If the group is large enough, even very small differences can become statistically significant." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"In common language and ordinary logic, a low likelihood of chance alone calling the shots means 'it’s close to certain'. A strong likelihood that chance could have ruled means 'it almost certainly can’t be'." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Most importantly, much of statistics involves clear thinking rather than numbers. And much, at least much of the statistical principles that reporters can most readily apply, is good sense." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Nature is complex, and almost all methods of observation and experiment are imperfect." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"[…] nonparametric methods […] are methods of examining data that do not rely on a numerical distribution. As a result, they don’t allow a few very large or very small or very wild numbers to run away with the analysis." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Regression toward the mean is the tendency of all values in every field of science – physical, biological, social, and economic – to move toward the average. […] The regression effect is common to all repeated measurements. Regression is part of an even more basic phenomenon: variation, or variability. Virtually everything that is measured varies from measurement to measurement. When repeated, every experiment has at least slightly different results." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Statistically, power means the probability of finding something if it’s there.[…] statisticians think of power as a function of both sample size and the accuracy of measurement, because that too affects the probability of finding something." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"The big problems with statistics, say its best practitioners, have little to do with computations and formulas. They have to do with judgment - how to design a study, how to conduct it, then how to analyze and interpret the results. Journalists reporting on statistics have many chances to do harm by shaky reporting, and so are also called on to make sophisticated judgments. How, then, can we tell which studies seem credible, which we should report?" (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"The first thing that you should understand about science is that it is almost always uncertain. The scientific process allows science to move ahead without waiting for an elusive 'proof positive'. […] How can science afford to act on less than certainty? Because science is a continuing story - always retesting ideas. One scientific finding leads scientists to conduct more research, which may support and expand on the original finding." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Where many known, measurable factors are involved, statisticians can use mathematical techniques to account for all the variables and try to find which are the truly important predictors. The terms for this include multiple regression, multivariate analysis, and discriminant analysis, and factor, cluster, path, and two-stage least-squares analyses." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

06 April 2006

🖍️Antoine Cornuéjols - Collected Quotes

"Hence, has machine learning uncovered truths that escaped the notice of philosophy, psychology, and biology? On one hand, it can be argued that machine learning has at least provided grounds for some of the claims of philosophy regarding the nature of knowledge and its acquisition. Against pure empiricism, induction requires prior knowledge, if only in the form of a constrained hypothesis space. In addition, there is a kind of conservation law at play in induction. The more a priori knowledge there is, the easier learning is and the fewer data are needed, and vice versa. The statistical study of machine learning allows quantifying this trade-off." (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)

"In effect, machine learning research has already brought us several interesting concepts. Most prominently, it has stressed the benefit of distinguishing between the properties of the hypothesis space - its richness and the valuation scheme associated with it - and the characteristics of the actual search procedure in this space, guided by the training data. This in turn suggests two important factors related to sequencing effects, namely forgetting and the nonoptimality of the search procedure. Both are key parameters than need to be thoroughly understood if one is to master sequencing effects." (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)

"On the other hand, the algorithms produced in machine learning during the last few decades seem quite remote from what can be expected to account for natural cognition. For one thing, there is virtually no notion of knowledge organization in these methods. Learning is supposed to arise on a blank slate, albeit a constrained one, and its output is not supposed to be used for subsequent learning episodes. Neither is there any hierarchy in the 'knowledge' produced. Learning is not conceived as an ongoing activity but rather as a one-shot process more akin to data analysis than to a gradual discovery development or even to an adaptive process. " (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)

"[...] the theory that establishes a link between the empirical fit of the candidate hypothesis with respect to the data and its expected value on unseen events becomes essentially inoperative if the data are not supposed to be independent of each other. This requirement is obviously at odds with most natural learning settings, where either the learner is actively searching for data or where learning occurs under the guidance of a teacher who is carefully choosing the data and their order of presentation." (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)

"There are many control parameters to a learning system. The question is to identify, at a sufficiently high level, the ones that can play a key role in sequencing effects. Because learning can be seen as the search for an optimal hypothesis in a given space under an inductive criteria defined over the training set, three means to control learning readily appear. The first one corresponds to a change of the hypothesis space. The second consists in modifying the optimization landscape. This can be done by changing either the training set (for instance, by a forgetting mechanism) or the inductive criteria. Finally, one can also fiddle with the exploration process. For instance, in the case of a gradient search, slowing down the search process can prevent the system from having time to find the local optimum, which, in turn, can introduce sequencing effects." (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)

"While it has been always considered that a piece of information could at worst be useless, it should now be acknowledged that it can have a negative impact. There is simply no theory of information at the moment offering a framework ready to account for this in general." (Antoine Cornuéjol, "The Necessity of Order in Machine Learning: Is Order in Order?", 2007)