15 April 2006

🖍️Ray Kurzweil - Collected Quotes

"A primary reason that evolution - of life-forms or technology - speeds up is that it builds on its own increasing order." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"It is in the nature of exponential growth that events develop extremely slowly for extremely long periods of time, but as one glides through the knee of the curve, events erupt at an increasingly furious pace. And that is what we will experience as we enter the twenty-first century." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Neither noise nor information is predictable." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Once a computer achieves human intelligence it will necessarily roar past it." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"Sometimes, a deeper order - a better fit to a purpose - is achieved through simplification rather than further increases in complexity." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"The Law of Accelerating Returns: As order exponentially increases, time exponentially speeds up (that is, the time interval between salient events grows shorter as time passes)." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"The Law of Time and Chaos: In a process, the time interval between salient events (that is, events that change the nature of the process, or significantly affect the future of the process) expands of contracts along with the amount of chaos." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Order [...] is information that fits a purpose." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"A key aspect of a probabilistic fractal is that it enables the generation of a great deal of apparent complexity, including extensive varying detail, from a relatively small amount of design information. Biology uses this same principle. Genes supply the design information, but the detail in an organism is vastly greater than the genetic design information."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Although the Singularity has many faces, its most important implication is this: our technology will match and then vastly exceed the refinement and suppleness of what we regard as the best of human traits."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Evolution moves towards greater complexity, greater elegance, greater knowledge, greater intelligence, greater beauty, greater creativity, and greater levels of subtle attributes such as love. […] Of course, even the accelerating growth of evolution never achieves an infinite level, but as it explodes exponentially it certainly moves rapidly in that direction." (Ray Kurzweil, "The Singularity is Near", 2005)

"However, the law of accelerating returns pertains to evolution, which is not a closed system. It takes place amid great chaos and indeed depends on the disorder in its midst, from which it draws its options for diversity. And from these options, an evolutionary process continually prunes its choices to create ever greater order."  (Ray Kurzweil, "The Singularity is Near", 2005)

"'Increasing complexity' on its own is not, however, the ultimate goal or end-product of these evolutionary processes. Evolution results in better answers, not necessarily more complicated ones. Sometimes a superior solution is a simpler one."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Machines can pool their resources, intelligence, and memories. Two machines—or one million machines - can join together to become one and then become separate again. Multiple machines can do both at the same time: become one and separate simultaneously. Humans call this falling in love, but our biological ability to do this is fleeting and unreliable." (Ray Kurzweil, "The Singularity is Near", 2005)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential'.” view'." (Ray Kurzweil, "The Singularity is Near", 2005)

"Order is information that fits a purpose. The measure of order is the measure of how well the information fits the purpose."  (Ray Kurzweil, "The Singularity is Near", 2005)

"The first idea is that human progress is exponential (that is, it expands by repeatedly multiplying by a constant) rather than linear (that is, expanding by repeatedly adding a constant). Linear versus exponential: Linear growth is steady; exponential growth becomes explosive." (Ray Kurzweil, "The Singularity is Near", 2005)

"The Singularity will represent the culmination of the merger of our biological thinking and existence with our technology, resulting in a world that is still human but that transcends our biological roots. There will be no distinction, post-Singularity, between human and machine or between physical and virtual reality. If you wonder what will remain unequivocally human in such a world, it’s simply this quality: ours is the species that inherently seeks to extend its physical and mental reach beyond current limitations." (Ray Kurzweil, "The Singularity is Near", 2005)

"[...] there are no hard problems, only problems that are hard to a certain level of intelligence." (Ray Kurzweil, "The Singularity is Near", 2005)

"When a machine manages to be simultaneously meaningful and surprising in the same rich way, it too compels a mentalistic interpretation. Of course, somewhere behind the scenes, there are programmers who, in principle, have a mechanical interpretation. But even for them, that interpretation loses its grip as the working program fills its memory with details too voluminous for them to grasp."  (Ray Kurzweil, "The Singularity is Near", 2005)

"If understanding language and other phenomena through statistical analysis does not count as true understanding, then humans have no understanding either." (Ray Kurzweil, "How to Create a Mind", 2012)

"The story of evolution unfolds with increasing levels of abstraction." (Ray Kurzweil, "How to Create a Mind", 2012)

14 April 2006

🖍️Ian Goodfellow - Collected Quotes

"Learning theory claims that a machine learning algorithm can generalize well from a finite training set of examples. This seems to contradict some basic principles of logic. Inductive reasoning, or inferring general rules from a limited set of examples, is not logically valid. To logically infer a rule describing every member of a set, one must have information about every member of that set." (Ian Goodfellow et al, "Deep Learning", 2015)

"A representation learning algorithm can discover a good set of features for a simple task in minutes, or for a complex task in hours to months." (Ian Goodfellow et al, "Deep Learning", 2015)

"Another crowning achievement of deep learning is its extension to the domain of reinforcement learning. In the context of reinforcement learning, an autonomous agent must learn to perform a task by trial and error, without any guidance from the human operator." (Ian Goodfellow et al, "Deep Learning", 2015)

"Cognitive science is an interdisciplinary approach to understanding the mind, combining multiple different levels of analysis."

"Logic provides a set of formal rules for determining what propositions are implied to be true or false given the assumption that some other set of propositions is true or false. Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions." (Ian Goodfellow et al, "Deep Learning", 2015)

"The field of deep learning is primarily concerned with how to build computer systems that are able to successfully solve tasks requiring intelligence, while the field of computational neuroscience is primarily concerned with building more accurate models of how the brain actually works." (Ian Goodfellow et al, "Deep Learning", 2015)

"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)

"The no free lunch theorem implies that we must design our machine learning algorithms to perform well on a specific task. We do so by building a set of preferences into the learning algorithm. When these preferences are aligned with the learning problems we ask the algorithm to solve, it performs better." (Ian Goodfellow et al, "Deep Learning", 2015)

🖍️N D Lewis - Collected Quotes

"Deep learning is an area of machine learning that emerged from the intersection of neural networks, artificial intelligence, graphical modeling, optimization, pattern recognition and signal processing." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Overfitting is like attending a concert of your favorite band. Depending on the acoustics of the concert venue you will hear both music and noise from the screams of the crowd to reverberations off walls and so on. Overfitting happens when your model perfectly fits both the music and the noise when the intent is to fit the structure (the music). It is generally a result of the predictor being too complex (recall Occams Razor) so that it fits the underlying structure as well as the noise. The consequence is a small or zero test set classification error. Alas, this super low error rate will fail to materialize on future unseen samples. One consequence of overfitting is poor generalization (prediction) on future data." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Underfitting can also be a problem. It happens when the predictor is too simplistic or rigid to capture the underlying characteristics of the data. In this case the test error will be rather large." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The bias variance decomposition is a useful tool for understanding classifier behavior. It turns out the expected misclassification rate can be decomposed into two components, a reducible error and irreducible error [...] Irreducible error or inherent uncertainty is associated with the natural variability in the phenomenon under study, and is therefore beyond our control. [...] Reducible error, as the name suggests, can be minimized. It can be decomposed into error due to squared bias and error due to variance." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

🖍️Roger J Barlow - Collected Quotes

"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Good programming style is important, not just a luxury. Programs are organic; a good program lives a long time, and has to be changed to meet new uses; a well-written program makes such adaptations easier. A logical, clearly written program helps greatly when tracking down possible bugs. However, although there is a lot of good advice on style laid down by various people, there are no hard and fast rules, only guidelines. It is like writing English in this respect." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"In everyday life, 'estimation' means a rough and imprecise procedure leading to a rough and imprecise result. You 'estimate' when you cannot measure exactly. In statistics, on the other hand, 'estimation' is a technical term. It means a precise and accurate procedure, leading to a result which may be imprecise, but where at least the extent of the imprecision is known. It has nothing to do with approximation. You have some data, from which you want to draw conclusions and produce a 'best' value for some particular numerical quantity (or perhaps for several quantities), and you probably also want to know how reliable this value is, i.e. what the error is on your estimate." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Least squares' means just what it says: you minimise the (suitably weighted) squared difference between a set of measurements and their predicted values. This is done by varying the parameters you want to estimate: the predicted values are adjusted so as to be close to the measurements; squaring the differences means that greater importance is placed on removing the large deviations." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Probabilities are pure numbers. Probability densities, on the other hand, have dimensions, the inverse of those of the variable x to which they apply." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Science is supposed to explain to us what is actually happening, and indeed what will happen, in the world. Unfortunately as soon as you try and do something useful with it, sordid arithmetical numbers start getting in the way and messing up the basic scientific laws." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Statistics is a tool. In experimental science you plan and carry out experiments, and then analyse and interpret the results. To do this you use statistical arguments and calculations. Like any other tool - an oscilloscope, for example, or a spectrometer, or even a humble spanner - you can use it delicately or clumsily, skillfully or ineptly. The more you know about it and understand how it works, the better you will be able to use it and the more useful it will be." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Subjective probability, also known as Bayesian statistics, pushes Bayes' theorem further by applying it to statements of the type described as 'unscientific' in the frequency definition. The probability of a theory (e.g. that it will rain tomorrow or that parity is not violated) is considered to be a subjective 'degree of belief - it can perhaps be measured by seeing what odds the person concerned will offer as a bet. Subsequent experimental evidence then modifies the initial degree of belief, making it stronger or weaker according to whether the results agree or disagree with the predictions of the theory in question." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"The best way to learn to write good programs is to read other people's. Notice whether they are clear or obscure, and try and understand why. Then imitate the good points and avoid the bad in your own programs." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"The principle of maximum likelihood is not a rule that requires justification - it does not need to be proved. It is merely a sensible way of producing an estimator. But although the name 'maximum likelihood' has a nice ring to it - it suggest that your estimate is the 'most likely' value - this is an unfair interpretation; it is the estimate that makes your data most likely - another thing altogether." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"There is a technical difference between a bar chart and a histogram in that the number represented is proportional to the length of bar in the former and the area in the latter. This matters if non-uniform binning is used. Bar charts can be used for qualitative or quantitative data, whereas histograms can only be used for quantitative data, as no meaning can be attached to the width of the bins if the data are qualitative." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989) 

"There is an obvious parallel between an expectation value and the mean of a data sample. The former is a sum over a theoretical probability distribution and the latter is a (similar) sum over a real data sample. The law of large numbers ensures that if a data sample is described by a theoretical distribution, then as N, the size of the data sample, goes to infinity […]." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"When you want to use some data to give the answer to a question, the first step is to formulate the question precisely by expressing it as a hypothesis. Then you consider the consequences of that hypothesis, and choose a suitable test to apply to the data. From the result of the test you accept or reject the hypothesis according to prearranged criteria. This cannot be infallible, and there is always a chance of getting the wrong answer, so you try and reduce the chance of such a mistake to a level which you consider reasonable." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

13 April 2006

🖍️Peter L Bernstein - Collected Quotes

"A normal distribution is most unlikely, although not impossible, when the observations are dependent upon one another - that is, when the probability of one event is determined by a preceding event. The observations will fail to distribute themselves symmetrically around the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"All the law [of large numbers] tells us is that the average of a large number of throws will be more likely than the average of a small number of throws to differ from the true average by less than some stated amount. And there will always be a possibility that the observed result will differ from the true average by a larger amount than the specified bound." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"But real-life situations often require us to measure probability in precisely this fashion - from sample to universe. In only rare cases does life replicate games of chance, for which we can determine the probability of an outcome before an event even occurs - a priori […] . In most instances, we have to estimate probabilities from what happened after the fact - a posteriori. The very notion of a posteriori implies experimentation and changing degrees of belief." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details - in the quality of information that forms the basis of probability estimates." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, the choice is not between rejecting a hypothesis and accepting it, but between reject and not - reject. You can decide that the probability that you are wrong is so small that you should not reject the hypothesis. You can decide that the probability that you are wrong is so large that you should reject the hypothesis. But with any probability short of zero that you are wrong - certainty rather than uncertainty - you cannot accept a hypothesis." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Until we can distinguish between an event that is truly random and an event that is the result of cause and effect, we will never know whether what we see is what we'll get, nor how we got what we got. When we take a risk, we are betting on an outcome that will result from a decision we have made, though we do not know for certain what the outcome will be. The essence of risk management lies in maximizing the areas where we have some control over the outcome while minimizing the areas where we have absolutely no control over the outcome and the linkage between effect and cause is hidden from us." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"We can assemble big pieces of information and little pieces, but we can never get all the pieces together. We never know for sure how good our sample is. That uncertainty is what makes arriving at judgments so difficult and acting on them so risky. […] When information is lacking, we have to fall back on inductive reasoning and try to guess the odds." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Whenever we make any decision based on the expectation that matters will return to 'normal', we are employing the notion of regression to the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

🖍️Phillip I Good - Collected Quotes

"A major problem with many studies is that the population of interest is not adequately defined before the sample is drawn. Don’t make this mistake. A second major source of error is that the sample proves to have been drawn from a different population than was originally envisioned." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A permutation test based on the original observations is appropriate only if one can assume that under the null hypothesis the observations are identically distributed in each of the populations from which the samples are drawn. If we cannot make this assumption, we will need to transform the observations, throwing away some of the information about them so that the distributions of the transformed observations are identical." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A well-formulated hypothesis will be both quantifiable and testable - that is, involve measurable quantities or refer to items that may be assigned to mutually exclusive categories. [...] When the objective of our investigations is to arrive at some sort of conclusion, then we need to have not only a hypothesis in mind, but also one or more potential alternative hypotheses." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Before we initiate data collection, we must have a firm idea of what we will measure. A second fundamental principle is also applicable to both experiments and surveys: Collect exact values whenever possible. Worry about grouping them in interval or discrete categories later. […] You can always group your results (and modify your groupings) after a study is completed. If after-the-fact grouping is a possibility, your design should state how the grouping will be determined; otherwise there will be the suspicion that you chose the grouping to obtain desired results." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Estimation methods should be impartial. Decisions should not depend on the accidental and quite irrelevant labeling of the samples. Nor should decisions depend on the units in which the measurements are made." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Every statistical procedure relies on certain assumptions for correctness. Errors in testing hypotheses come about either because the assumptions underlying the chosen test are not satisfied or because the chosen test is less powerful than other competing procedures."(Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"[…] finding at least one cluster of events in time or in spaceh as a greater probability than finding no clusters at all (equally spaced events)." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"More important than comparing the means of populations can be determining why the variances are different." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Most statistical procedures rely on two fundamental assumptions: that the observations are independent of one another and that they are identically distributed. If your methods of collection fail to honor these assumptions, then your analysis must fail also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Never assign probabilities to the true state of nature, but only to the validity of your own predictions." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The greatest error associated with the use of statistical procedures is to make the assumption that one single statistical methodology can suffice for all applications. […] But one methodology can never be better than another, nor can estimation replace hypothesis testing or vice versa. Every methodology has a proper domain of application and another set of applications for which it fails. Every methodology has its drawbacks and its advantages, its assumptions, and its sources of error." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The sources of error in applying statistical procedures are legion and include all of the following: (•) Using the same set of data both to formulate hypotheses and to test them. (•) Taking samples from the wrong population or failing to specify the population(s) about which inferences are to be made in advance. (•) Failing to draw random, representative samples. (•) Measuring the wrong variables or failing to measure what you’d hoped to measure. (•) Using inappropriate or inefficient statistical methods. (•) Failing to validate models. But perhaps the most serious source of error lies in letting statistical procedures make decisions for you." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in estimation stem from a failure to measure what one wanted to measure or what one thought one was measuring. Misleading definitions, inaccurate measurements, errors in recording and transcription, and confounding variables plague results. To forestall such errors, review your data collection protocols and procedure manuals before you begin, run several preliminary trials, record potential confounding variables, monitor data collection, and review the data as they are collected." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in Statistics - and not incidentally, in most human endeavors - arise from a reluctance (or even an inability) to plan. Some demon (or demonic manager) seems to be urging us to cross the street before we’ve had the opportunity to look both ways. Even on those rare occasions when we do design an experiment, we seem more obsessed with the mechanics than with the concepts that underlie it." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Use statistics as a guide to decision making rather than a mandate." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"When we assert that for a given population a percentage of samples will have a specific composition, this also is a deduction. But when we make an inductive generalization about a population based upon our analysis of a sample, we are on shakier ground. It is one thing to assert that if an observation comes from a normal distribution with mean zero, the probability is one-half that it is positive. It is quite another if, on observing that half the observations in the sample are positive, we assert that half of all the possible observations that might be drawn from that population will be positive also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"While a null hypothesis can facilitate statistical inquiry - an exact permutation test is impossible without it - it is never mandated. In any event, virtually any quantifiable hypothesis can be converted into null form. There is no excuse and no need to be content with a meaningless null. […] We must specify our alternatives before we commence an analysis, preferably at the same time we design our study." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

🖍️Alex Reinhart - Collected Quotes

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’. Just try several different analyses offered by your statistical software until one of them turns up an interesting result, and then pretend this is the analysis you intended to do all along. Without psychic powers, it’s almost impossible to tell when a published result was obtained through data torture." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In science, it is important to limit two kinds of errors: false positives, where you conclude there is an effect when there isn’t, and false negatives, where you fail to notice a real effect. In some sense, false positives and false negatives are flip sides of the same coin. If we’re too ready to jump to conclusions about effects, we’re prone to get false positives; if we’re too conservative, we’ll err on the side of false negatives." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In exploratory data analysis, you don’t choose a hypothesis to test in advance. You collect data and poke it to see what interesting details might pop out, ideally leading to new hypotheses and new experiments. This process involves making numerous plots, trying a few statistical analyses, and following any promising leads. But aimlessly exploring data means a lot of opportunities for false positives and truth inflation." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"In short, statistical significance does not mean your result has any practical significance. As for statistical insignificance, it doesn’t tell you much. A statistically insignificant difference could be nothing but noise, or it could represent a real effect that can be pinned down only with more data." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"More useful than a statement that an experiment’s results were statistically insignificant is a confidence interval giving plausible sizes for the effect. Even if the confidence interval includes zero, its width tells you a lot: a narrow interval covering zero tells you that the effect is most likely small (which may be all you need to know, if a small effect is not practically useful), while a wide interval clearly shows that the measurement was not precise enough to draw conclusions." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Much of experimental science comes down to measuring differences. [...] We use statistics to make judgments about these kinds of differences. We will always observe some difference due to luck and random variation, so statisticians talk about statistically significant differences when the difference is larger than could easily be produced by luck. So first we must learn how to make that decision." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Overlapping confidence intervals do not mean two values are not significantly different. Checking confidence intervals or standard errors will mislead. It’s always best to use the appropriate hypothesis test instead. Your eyeball is not a well-defined statistical procedure." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed. [...] Remember, a p value is not a measure of how right you are or how important a difference is. Instead, think of it as a measure of surprise." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"There is exactly one situation when visually checking confidence intervals works, and it is when comparing the confidence interval against a fixed value, rather than another confidence interval. If you want to know whether a number is plausibly zero, you may check to see whether its confidence interval overlaps with zero. There are, of course, formal statistical procedures that generate confidence intervals that can be compared by eye and that even correct for multiple comparisons automatically. Unfortunately, these procedures work only in certain circumstances;" (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"When statisticians are asked for an interesting paradoxical result in statistics, they often turn to Simpson’s paradox. Simpson’s paradox arises whenever an apparent trend in data, caused by a confounding variable, can be eliminated or reversed by splitting the data into natural groups." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

12 April 2006

🖍️Kaiser Fung - Collected Quotes

"Numbers already rule your world. And you must not be in the dark about this fact. See how some applied scientists use statistical thinking to make our lives better. You will be amazed how you can use numbers to make everyday decisions in your own life." (Kaiser Fung, "Numbers Rule the World", 2010)

"The issue of group differences is fundamental to statistical thinking. The heart of this matter concerns which groups should be aggregated and which shouldn’t." (Kaiser Fung, "Numbers Rule the World", 2010)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Measuring anything subjective always prompts perverse behavior. [...] All measurement systems are subject to abuse." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"No subjective metric can escape strategic gaming [...] The possibility of mischief is bottomless. Fighting ratings is fruitless, as they satisfy a very human need. If one scheme is beaten down, another will take its place and wear its flaws. Big Data just deepens the danger. The more complex the rating formulas, the more numerous the opportunities there are to dress up the numbers. The larger the data sets, the harder it is to audit them." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"NUMBERSENSE is not taking numbers at face value. NUMBERSENSE is the ability to relate numbers here to numbers there, to separate the credible from the chimerical. It means drawing the dividing line between science hour and story time." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statistical models in the social sciences rely on correlations, generally not causes, of our behavior. It is inevitable that such models of reality do not capture reality well. This explains the excess of false positives and false negatives." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statistically speaking, the best predictive models are gems." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statisticians set a high bar when they assign a cause to an effect. [...] A model that ignores cause–effect relationships cannot attain the status of a model in the physical sciences. This is a structural limitation that no amount of data - not even Big Data - can surmount." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Usually, it is impossible to restate past data. As a result, all history must be whitewashed and measurement starts from scratch." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

🖍️Bart Kosko - Collected Quotes

"A bell curve shows the 'spread' or variance in our knowledge or certainty. The wider the bell the less we know. An infinitely wide bell is a flat line. Then we know nothing. The value of the quantity, position, or speed could lie anywhere on the axis. An infinitely narrow bell is a spike that is infinitely tall. Then we have complete knowledge of the value of the quantity. The uncertainty principle says that as one bell curve gets wider the other gets thinner. As one curve peaks the other spreads. So if the position bell curve becomes a spike and we have total knowledge of position, then the speed bell curve goes flat and we have total uncertainty (infinite variance) of speed." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Bivalence trades accuracy for simplicity. Binary outcomes of yes and no, white and black, true and false simplify math and computer processing. You can work with strings of 0s and 1s more easily than you can work with fractions. But bivalence requires some force fitting and rounding off [...] Bivalence holds at cube corners. Multivalence holds everywhere else." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Fuzziness has a formal name in science: multivalence. The opposite of fuzziness is bivalence or two-valuedness, two ways to answer each question, true or false, 1 or 0. Fuzziness means multivalence. It means three or more options, perhaps an infinite spectrum of options, instead of just two extremes. It means analog instead of binary, infinite shades of gray between black and white." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"The binary logic of modern computers often falls short when describing the vagueness of the real world. Fuzzy logic offers more graceful alternatives." (Bart Kosko & Satoru Isaka, "Fuzzy Logic,” Scientific American Vol. 269, 1993)

"A bit involves both probability and an experiment that decides a binary or yes-no question. Consider flipping a coin. One bit of in-formation is what we learn from the flip of a fair coin. With an unfair or biased coin the odds are other than even because either heads or tails is more likely to appear after the flip. We learn less from flipping the biased coin because there is less surprise in the outcome on average. Shannon's bit-based concept of entropy is just the average information of the experiment. What we gain in information from the coin flip we lose in uncertainty or entropy." (Bart Kosko, "Noise", 2006)

"A signal has a finite-length frequency spectrum only if it lasts infinitely long in time. So a finite spectrum implies infinite time and vice versa. The reverse also holds in the ideal world of mathematics: A signal is finite in time only if it has a frequency spectrum that is infinite in extent." (Bart Kosko, "Noise", 2006)

"Bell curves don't differ that much in their bells. They differ in their tails. The tails describe how frequently rare events occur. They describe whether rare events really are so rare. This leads to the saying that the devil is in the tails." (Bart Kosko, "Noise", 2006)

"Chaos can leave statistical footprints that look like noise. This can arise from simple systems that are deterministic and not random. [...] The surprising mathematical fact is that most systems are chaotic. Change the starting value ever so slightly and soon the system wanders off on a new chaotic path no matter how close the starting point of the new path was to the starting point of the old path. Mathematicians call this sensitivity to initial conditions but many scientists just call it the butterfly effect. And what holds in math seems to hold in the real world - more and more systems appear to be chaotic." (Bart Kosko, "Noise", 2006)

"'Chaos' refers to systems that are very sensitive to small changes in their inputs. A minuscule change in a chaotic communication system can flip a 0 to a 1 or vice versa. This is the so-called butterfly effect: Small changes in the input of a chaotic system can produce large changes in the output. Suppose a butterfly flaps its wings in a slightly different way. can change its flight path. The change in flight path can in time change how a swarm of butterflies migrates." (Bart Kosko, "Noise", 2006)

"I wage war on noise every day as part of my work as a scientist and engineer. We try to maximize signal-to-noise ratios. We try to filter noise out of measurements of sounds or images or anything else that conveys information from the world around us. We code the transmission of digital messages with extra 0s and 1s to defeat line noise and burst noise and any other form of interference. We design sophisticated algorithms to track noise and then cancel it in headphones or in a sonogram. Some of us even teach classes on how to defeat this nemesis of the digital age. Such action further conditions our anti-noise reflexes." (Bart Kosko, "Noise", 2006)

"Linear systems do not benefit from noise because the output of a linear system is just a simple scaled version of the input [...] Put noise in a linear system and you get out noise. Sometimes you get out a lot more noise than you put in. This can produce explosive effects in feedback systems that take their own outputs as inputs." (Bart Kosko, "Noise", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"Noise is a signal we don't like. Noise has two parts. The first has to do with the head and the second with the heart. The first part is the scientific or objective part: Noise is a signal. [...] The second part of noise is the subjective part: It deals with values. It deals with how we draw the fuzzy line between good signals and bad signals. Noise signals are the bad signals. They are the unwanted signals that mask or corrupt our preferred signals. They not only interfere but they tend to interfere at random." (Bart Kosko, "Noise", 2006)

"Noise is an unwanted signal. A signal is anything that conveys information or ultimately anything that has energy. The universe consists of a great deal of energy. Indeed a working definition of the universe is all energy anywhere ever. So the answer turns on how one defines what it means to be wanted and by whom." (Bart Kosko, "Noise", 2006)

"The central limit theorem differs from laws of large numbers because random variables vary and so they differ from constants such as population means. The central limit theorem says that certain independent random effects converge not to a constant population value such as the mean rate of unemployment but rather they converge to a random variable that has its own Gaussian bell-curve description." (Bart Kosko, "Noise", 2006)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves." (Bart Kosko, "Noise", 2006)

More quotes from Bart Kosko at QuotableMath.blogspot.com.

🖍️Tim Harford - Collected Quotes

"An algorithm, meanwhile, is a step-by-step recipe for performing a series of actions, and in most cases 'algorithm' means simply 'computer program'." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Each decision about what data to gather and how to analyze them is akin to standing on a pathway as it forks left and right and deciding which way to go. What seems like a few simple choices can quickly multiply into a labyrinth of different possibilities. Make one combination of choices and you’ll reach one conclusion; make another, equally reasonable, and you might find a very different pattern in the data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Each of us is sweating data, and those data are being mopped up and wrung out into oceans of information. Algorithms and large datasets are being used for everything from finding us love to deciding whether, if we are accused of a crime, we go to prison before the trial or are instead allowed to post bail. We all need to understand what these data are and how they can be exploited." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Good statistics are not a trick, although they are a kind of magic. Good statistics are not smoke and mirrors; in fact, they help us see more clearly. Good statistics are like a telescope for an astronomer, a microscope for a bacteriologist, or an X-ray for a radiologist. If we are willing to let them, good statistics help us see things about the world around us and about ourselves - both large and small - that we would not be able to see in any other way." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"If we don’t understand the statistics, we’re likely to be badly mistaken about the way the world is. It is all too easy to convince ourselves that whatever we’ve seen with our own eyes is the whole truth; it isn’t. Understanding causation is tough even with good statistics, but hopeless without them. [...] And yet, if we understand only the statistics, we understand little. We need to be curious about the world that we see, hear, touch, and smell, as well as the world we can examine through a spreadsheet." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"[…] in a world where so many people seem to hold extreme views with strident certainty, you can deflate somebody’s overconfidence and moderate their politics simply by asking them to explain the details." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"It’d be nice to fondly imagine that high-quality statistics simply appear in a spreadsheet somewhere, divine providence from the numerical heavens. Yet any dataset begins with somebody deciding to collect the numbers. What numbers are and aren’t collected, what is and isn’t measured, and who is included or excluded are the result of all-too-human assumptions, preconceptions, and oversights." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Making big data work is harder than it seems. Statisticians have spent the past two hundred years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster, and cheaper these days, but we must not pretend that the traps have all been made safe. They have not." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis. All too often, this is much harder than it should be. […] So the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Much of the data visualization that bombards us today is decoration at best, and distraction or even disinformation at worst. The decorative function is surprisingly common, perhaps because the data visualization teams of many media organizations are part of the art departments. They are led by people whose skills and experience are not in statistics but in illustration or graphic design. The emphasis is on the visualization, not on the data. It is, above all, a picture." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Numbers can easily confuse us when they are unmoored from a clear definition." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Premature enumeration is an equal-opportunity blunder: the most numerate among us may be just as much at risk as those who find their heads spinning at the first mention of a fraction. Indeed, if you’re confident with numbers you may be more prone than most to slicing and dicing, correlating and regressing, normalizing and rebasing, effortlessly manipulating the numbers on the spreadsheet or in the statistical package - without ever realizing that you don’t fully understand what these abstract quantities refer to. Arguably this temptation lay at the root of the last financial crisis: the sophistication of mathematical risk models obscured the question of how, exactly, risks were being measured, and whether those measurements were something you’d really want to bet your global banking system on." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The 'margin of error' reported in opinion polls reflects this risk, and the larger the sample, the smaller the margin of error. […] sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statistical metrics can show us facts and trends that would be impossible to see in any other way, but often they’re used as a substitute for relevant experience, by managers or politicians without specific expertise or a close-up view." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statisticians are sometimes dismissed as bean counters. The sneering term is misleading as well as unfair. Most of the concepts that matter in policy are not like beans; they are not merely difficult to count, but difficult to define. Once you’re sure what you mean by 'bean', the bean counting itself may come more easily. But if we don’t understand the definition, then there is little point in looking at the numbers. We have fooled ourselves before we have begun."(Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"So information is beautiful - but misinformation can be beautiful, too. And producing beautiful misinformation is becoming easier than ever." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The contradiction between what we see with our own eyes and what the statistics claim can be very real. […] The truth is more complicated. Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience - and in those cases, which to believe." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The world is full of patterns that are too subtle or too rare to detect by eyeballing them, and a pattern doesn’t need to be very subtle or rare to be hard to spot without a statistical lens." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The whole discipline of statistics is built on measuring or counting things. […] it is important to understand what is being measured or counted, and how. It is surprising how rarely we do this. Over the years, as I found myself trying to lead people out of statistical mazes week after week, I came to realize that many of the problems I encountered were because people had taken a wrong turn right at the start. They had dived into the mathematics of a statistical claim - asking about sampling errors and margins of error, debating if the number is rising or falling, believing, doubting, analyzing, dissecting - without taking the ti- me to understand the first and most obvious fact: What is being measured, or counted? What definition is being used?" (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Those of us in the business of communicating ideas need to go beyond the fact-check and the statistical smackdown. Facts are valuable things, and so is fact-checking. But if we really want people to understand complex issues, we need to engage their curiosity. If people are curious, they will learn." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious […]. Other omissions show up only when we take a close look at the claim in question." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We don’t need to become emotionless processors of numerical information - just noticing our emotions and taking them into account may often be enough to improve our judgment. Rather than requiring superhuman control over our emotions, we need simply to develop good habits." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We filter new information. If it accords with what we expect, we’ll be more likely to accept it. […] Our brains are always trying to make sense of the world around us based on incomplete information. The brain makes predictions about what it expects, and tends to fill in the gaps, often based on surprisingly sparse data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"We should conclude nothing because that pair of numbers alone tells us very little. If we want to understand what’s happening, we need to step back and take in a broader perspective." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"[…] when it comes to interpreting the world around us, we need to realize that our feelings can trump our expertise. […] The more extreme the emotional reaction, the harder it is to think straight. […] It is not easy to master our emotions while assessing information that matters to us, not least because our emotions can lead us astray in different directions. […] We often find ways to dismiss evidence that we don’t like. And the opposite is true, too: when evidence seems to support our preconceptions, we are less likely to look too closely for flaws." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"When we are trying to understand a statistical claim - any statistical claim - we need to start by asking ourselves what the claim actually means. [...] A surprising statistical claim is a challenge to our existing worldview. It may provoke an emotional response - even a fearful one." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020) 

🖍️Nikola K Kasabov - Collected Quotes

"A strategy is usually expressed by a set of heuristic rules. The heuristic rules ease the process of searching for an optimal solution. The process is usually iterative and at one step either the global optimum for the whole problem (state) space is found and the process stops, or a local optimum for a subspace of the state space of the problem is found and the problem continues, if it is possible to improve." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Adaptation is the process of changing a system during its operation in a dynamically changing environment. Learning and interaction are elements of this process. Without adaptation there is no intelligence." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

 "An artificial neural network (or simply a neural network) is a biologically inspired computational model that consists of processing elements (neurons) and connections between them, as well as of training and recall algorithms." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Artificial intelligence comprises methods, tools, and systems for solving problems that normally require the intelligence of humans. The term intelligence is always defined as the ability to learn effectively, to react adaptively, to make proper decisions, to communicate in language or images in a sophisticated way, and to understand." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996) 

"Data obtained without any external disturbance or corruption are called clean; noisy data mean that a small random ingredient is added to the clean data." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Fuzzy systems are excellent tools for representing heuristic, commonsense rules. Fuzzy inference methods apply these rules to data and infer a solution. Neural networks are very efficient at learning heuristics from data. They are 'good problem solvers' when past data are available. Both fuzzy systems and neural networks are universal approximators in a sense, that is, for a given continuous objective function there will be a fuzzy system and a neural network which approximate it to any degree of accuracy." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Fuzzy systems are rule-based expert systems based on fuzzy rules and fuzzy inference. Fuzzy rules represent in a straightforward way 'commonsense' knowledge and skills, or knowledge that is subjective, ambiguous, vague, or contradictory." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Generalization is the process of matching new, unknown input data with the problem knowledge in order to obtain the best possible solution, or one close to it. Generalization means reacting properly to new situations, for example, recognizing new images, or classifying new objects and situations. Generalization can also be described as a transition from a particular object description to a general concept description. This is a major characteristic of all intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996) 

"Generally speaking, problem knowledge for solving a given problem may consist of heuristic rules or formulas that comprise the explicit knowledge, and past-experience data that comprise the implicit, hidden knowledge. Knowledge represents links between the domain space and the solution space, the space of the independent variables and the space of the dependent variables." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Heuristic (it is of Greek origin) means discovery. Heuristic methods are based on experience, rational ideas, and rules of thumb. Heuristics are based more on common sense than on mathematics. Heuristics are useful, for example, when the optimal solution needs an exhaustive search that is not realistic in terms of time. In principle, a heuristic does not guarantee the best solution, but a heuristic solution can provide a tremendous shortcut in cost and time." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Heuristic methods may aim at local optimization rather than at global optimization, that is, the algorithm optimizes the solution stepwise, finding the best solution at each small step of the solution process and 'hoping' that the global solution, which comprises the local ones, would be satisfactory." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Inference is the process of matching current facts from the domain space to the existing knowledge and inferring new facts. An inference process is a chain of matchings. The intermediate results obtained during the inference process are matched against the existing knowledge. The length of the chain is different. It depends on the knowledge base and on the inference method applied." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Learning is the process of obtaining new knowledge. It results in a better reaction to the same inputs at the next session of operation. It means improvement. It is a step toward adaptation. Learning is a major characteristic of intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Prediction (forecasting) is the process of generating information for the possible future development of a process from data about its past and its present development." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Representation is the process of transforming existing problem knowledge to some of the known knowledge-engineering schemes in order to process it by applying knowledge-engineering methods. The result of the representation process is the problem knowledge base in a computer format." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The most distinguishing property of fuzzy logic is that it deals with fuzzy propositions, that is, propositions which contain fuzzy variables and fuzzy values, for example, 'the temperature is high', 'the height is short'. The truth values for fuzzy propositions are not TRUE/FALSE only, as is the case in propositional boolean logic, but include all the grayness between two extreme values." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Validation is the process of testing how good the solutions produced by a system are. The results produced by a system are usually compared with the results obtained either by experts or by other systems. Validation is an extremely important part of the process of developing every knowledge-based system. Without comparing the results produced by the system with reality, there is little point in using it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

11 April 2006

🖍️Matthew Kirk - Collected Quotes

"A good proxy for complexity in a machine learning model is how fast it takes to train it." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Cross-validation is a method of splitting all of your data into two parts: training and validation. The training data is used to build the machine learning model, whereas the validation data is used to validate that the model is doing what is expected. This increases our ability to find and determine the underlying errors in a model." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"In statistics, there is a measure called power that denotes the probability of not finding a false negative. As power goes up, false negatives go down. However, what influences this measure is the sample size. If our sample size is too small, we just don’t have enough information to come up with a good solution." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is the intersection between theoretically sound computer science and practically noisy data. Essentially, it’s about machines making sense out of data in much the same way that humans do." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Machine learning is well suited for the unpredictable future, because most algorithms learn from new information. But as new information is found, it can also come in unstable forms, and new issues can arise that weren’t thought of before. We don’t know what we don’t know. When processing new information, it’s sometimes hard to tell whether our model is working." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Precision and recall are ways of monitoring the power of the machine learning implementation. Precision is a metric that monitors the percentage of true positives. […] Recall is the ratio of true positives to true positive plus false negatives." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Supervised learning, or function approximation, is simply fitting data to a function of any variety.  […] Unsupervised learning involves figuring out what makes the data special. […] Reinforcement learning involves figuring out how to play a multistage game with rewards and payoffs. Think of it as the algorithms that optimize the life of something." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Underfitting is when a model doesn’t take into account enough information to accurately model real life. For example, if we observed only two points on an exponential curve, we would probably assert that there is a linear relationship there. But there may not be a pattern, because there are only two points to reference. [...] It seems that the best way to mitigate underfitting a model is to give it more information, but this actually can be a problem as well. More data can mean more noise and more problems. Using too much data and too complex of a model will yield something that works for that particular data set and nothing else." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

🖍️DeWayne R Derryberry - Collected Quotes

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"A key difference between a traditional statistical problems and a time series problem is that often, in time series, the errors are not independent." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"A stationary time series is one that has had trend elements (the signal) removed and that has a time invariant pattern in the random noise. In other words, although there is a pattern of serial correlation in the noise, that pattern seems to mimic a fixed mathematical model so that the same model fits any arbitrary, contiguous subset of the noise." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Both real and simulated data are very important for data analysis. Simulated data is useful because it is known what process generated the data. Hence it is known what the estimated signal and noise should look like (simulated data actually has a well-defined signal and well-defined noise). In this setting, it is possible to know, in a concrete manner, how well the modeling process has worked." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"Either a logarithmic or a square-root transformation of the data would produce a new series more amenable to fit a simple trigonometric model. It is often the case that periodic time series have rounded minima and sharp-peaked maxima. In these cases, the square root or logarithmic transformation seems to work well most of the time.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […].(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"If the observations/errors are not independent, the statistical formulations are completely unreliable unless corrections can be made.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Not all data sets lend themselves to data splitting. The data set may be too small to split and/or the fitted model may be a local smoother. In the first case, there is too little data upon which to build a model if the data is split; and in the second case, it is not expected the model for any part of the data to directly interpolate/extrapolate to any other part of the model. For these cases, a different approach to cross-validation is possible, something similar to bootstrapping." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"Once a model has been fitted to the data, the deviations from the model are the residuals. If the model is appropriate, then the residuals mimic the true errors. Examination of the residuals often provides clues about departures from the modeling assumptions. Lack of fit - if there is curvature in the residuals, plotted versus the fitted values, this suggests there may be whole regions where the model overestimates the data and other whole regions where the model underestimates the data. This would suggest that the current model is too simple relative to some better model.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Prediction about the future assumes that the statistical model will continue to fit future data. There are several reasons this is often implausible, but it also seems clear that the model will often degenerate slowly in quality, so that the model will fit data only a few periods in the future almost as well as the data used to fit the model. To some degree, the reliability of extrapolation into the future involves subject-matter expertise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"[The normality] assumption is the least important one for the reliability of the statistical procedures under discussion. Violations of the normality assumption can be divided into two general forms: Distributions that have heavier tails than the normal and distributions that are skewed rather than symmetric. If data is skewed, the formulas we are discussing are still valid as long as the sample size is sufficiently large. Although the guidance about 'how skewed' and 'how large a sample' can be quite vague, since the greater the skew, the larger the required sample size. For the data commonly used in time series and for the sample sizes (which are generally quite large) used, skew is not a problem. On the other hand, heavy tails can be very problematic." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

 "The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Transformations of data alter statistics. For example, the mean of a data set can be found, but it is not easy to relate the mean of a data set to the mean of the logarithm of that data set. The median is far friendlier to transformations. If the median of a data set is found, then the logarithm of the data set is analyzed; the median of the log transformed data will be the log of the original median.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014) 

"When data is not normal, the reason the formulas are working is usually the central limit theorem. For large sample sizes, the formulas are producing parameter estimates that are approximately normal even when the data is not itself normal. The central limit theorem does make some assumptions and one is that the mean and variance of the population exist. Outliers in the data are evidence that these assumptions may not be true. Persistent outliers in the data, ones that are not errors and cannot be otherwise explained, suggest that the usual procedures based on the central limit theorem are not applicable.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Whenever the data is periodic, at some level, there are only as many observations as the number of complete periods. This global feature of the data suggests caution in understanding more detailed features of the data. While a curvature model might be appropriate for this data, there is too little data to know this, and some skepticism might be in order if such a model were fitted to the data." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

🖍️Erik J Larson - Collected Quotes

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"First, intelligence is situational - there is no such thing as general intelligence. Your brain is one piece in a broader system which includes your body, your environment, other humans, and culture as a whole. Second, it is contextual - far from existing in a vacuum, any individual intelligence will always be both defined and limited by its environment. (And currently, the environment, not the brain, is acting as the bottleneck to intelligence.) Third, human intelligence is largely externalized, contained not in your brain but in your civilization. Think of individuals as tools, whose brains are modules in a cognitive system much larger than themselves - a system that is self-improving and has been for a long time." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Inference is to bring about a new thought, which in logic amounts to drawing a conclusion, and more generally involves using what we already know, and what we see or observe, to update prior beliefs. […] Inference is also a leap of sorts, deemed reasonable […] Inference is a basic cognitive act for intelligent minds. If a cognitive agent (a person, an AI system) is not intelligent, it will infer badly. But any system that infers at all must have some basic intelligence, because the very act of using what is known and what is observed to update beliefs is inescapably tied up with what we mean by intelligence. If an AI system is not inferring at all, it doesn’t really deserve to be called AI." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"People who assume that extensions of modern machine learning methods like deep learning will somehow 'train up', or learn to be intelligent like humans, do not understand the fundamental limitations that are already known. Admitting the necessity of supplying a bias to learning systems is tantamount to Turing’s observing that insights about mathematics must be supplied by human minds from outside formal methods, since machine learning bias is determined, prior to learning, by human designers." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"[...] the focus on Big Data AI seems to be an excuse to put forth a number of vague and hand-waving theories, where the actual details and the ultimate success of neuroscience is handed over to quasi- mythological claims about the powers of large datasets and inductive computation. Where humans fail to illuminate a complicated domain with testable theory, machine learning and big data supposedly can step in and render traditional concerns about finding robust theories. This seems to be the logic of Data Brain efforts today." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"The idea that we can predict the arrival of AI typically sneaks in a premise, to varying degrees acknowledged, that successes on narrow AI systems like playing games will scale up to general intelligence, and so the predictive line from artificial intelligence to artificial general intelligence can be drawn with some confidence. This is a bad assumption, both for encouraging progress in the field toward artificial general intelligence, and for the logic of the argument for prediction." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"The problem-solving view of intelligence helps explain the production of invariably narrow applications of AI throughout its history. Game playing, for instance, has been a source of constant inspiration for the development of advanced AI techniques, but games are simplifications of life that reward simplified views of intelligence. […] Treating intelligence as problem-solving thus gives us narrow applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"To accomplish their goals, what are now called machine learning systems must each learn something specific. Researchers call this giving the machine a 'bias'. […] A bias in machine learning means that the system is designed and tuned to learn something. But this is, of course, just the problem of producing narrow problem-solving applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

10 April 2006

🖍️Steven S Skiena - Collected Quotes

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Exploratory data analysis is the search for patterns and trends in a given data set. Visualization techniques play an important part in this quest. Looking carefully at your data is important for several reasons, including identifying mistakes in collection/processing, finding violations of statistical assumptions, and suggesting interesting hypotheses." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The advent of massive data sets is changing in the way science is done. The traditional scientific method is hypothesis driven. The researcher formulates a theory of how the world works, and then seeks to support or reject this hypothesis based on data. By contrast, data-driven science starts by assembling a substantial data set, and then hunts for patterns that ideally will play the role of hypotheses for future analysis." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The danger of overfitting is particularly severe when the training data is not a perfect gold standard. Human class annotations are often subjective and inconsistent, leading boosting to amplify the noise at the expense of the signal. The best boosting algorithms will deal with overfitting though regularization. The goal will be to minimize the number of non-zero coefficients, and avoid large coefficients that place too much faith in any one classifier in the ensemble." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Using noise (the uncorrelated variables) to fit noise (the residual left from a simple model on the genuinely correlated variables) is asking for trouble." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variables which follow symmetric, bell-shaped distributions tend to be nice as features in models. They show substantial variation, so they can be used to discriminate between things, but not over such a wide range that outliers are overwhelming." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.