18 April 2006

🖍️Charu C Aggarwal - Collected Quotes

"A major advantage of probabilistic models is that they can be easily applied to virtually any data type (or mixed data type), as long as an appropriate generative model is available for each mixture component. [...] A downside of probabilistic models is that they try to fit the data to a particular kind of distribution, which may often not be appropriate for the underlying data. Furthermore, as the number of model parameters increases, over-fitting becomes more common. In such cases, the outliers may fit the underlying model of normal data. Many parametric models are also harder to interpret in terms of intensional knowledge, especially when the parameters of the model cannot be intuitively presented to an analyst in terms of underlying attributes. This can defeat one of the important purposes of anomaly detection, which is to provide diagnostic understanding of the abnormal data generative process." (Charu C Aggarwal, "Outlier Analysis", 2013)

"An attempt to use the wrong model for a given data set is likely to provide poor results. Therefore, the core principle of discovering outliers is based on assumptions about the structure of the normal patterns in a given data set. Clearly, the choice of the 'normal' model depends highly upon the analyst’s understanding of the natural data patterns in that particular domain." (Charu C Aggarwal, "Outlier Analysis", 2013)

"Dimensionality reduction and regression modeling are particularly hard to interpret in terms of original attributes, when the underlying data dimensionality is high. This is because the subspace embedding is defined as a linear combination of attributes with positive or negative coefficients. This cannot easily be intuitively interpreted in terms specific properties of the data attributes." (Charu C Aggarwal, "Outlier Analysis", 2013)

"Typically, most outlier detection algorithms use some quantified measure of the outlierness of a data point, such as the sparsity of the underlying region, nearest neighbor based distance, or the fit to the underlying data distribution. Every data point lies on a continuous spectrum from normal data to noise, and finally to anomalies [...] The separation of the different regions of this spectrum is often not precisely defined, and is chosen on an ad-hoc basis according to application-specific criteria. Furthermore, the separation between noise and anomalies is not pure, and many data points created by a noisy generative process may be deviant enough to be interpreted as anomalies on the basis of the outlier score. Thus, anomalies will typically have a much higher outlier score than noise, but this is not a distinguishing factor between the two as a matter of definition. Rather, it is the interest of the analyst, which regulates the distinction between noise and an anomaly." (Charu C Aggarwal, "Outlier Analysis", 2013) 

"Even though a natural way of avoiding overfitting is to simply build smaller networks (with fewer units and parameters), it has often been observed that it is better to build large networks and then regularize them in order to avoid overfitting. This is because large networks retain the option of building a more complex model if it is truly warranted. At the same time, the regularization process can smooth out the random artifacts that are not supported by sufficient data. By using this approach, we are giving the model the choice to decide what complexity it needs, rather than making a rigid decision for the model up front (which might even underfit the data)." (Charu C Aggarwal, "Neural Networks and Deep Learning: A Textbook", 2018)

"Regularization is particularly important when the amount of available data is limited. A neat biological interpretation of regularization is that it corresponds to gradual forgetting, as a result of which 'less important' (i.e., noisy) patterns are removed. In general, it is often advisable to use more complex models with regularization rather than simpler models without regularization." (Charu C Aggarwal, "Neural Networks and Deep Learning: A Textbook", 2018)

"The high generalization error in a neural network may be caused by several reasons. First, the data itself might have a lot of noise, in which case there is little one can do in order to improve accuracy. Second, neural networks are hard to train, and the large error might be caused by the poor convergence behavior of the algorithm. The error might also be caused by high bias, which is referred to as underfitting. Finally, overfitting (i.e., high variance) may cause a large part of the generalization error. In most cases, the error is a combination of more than one of these different factors." (Charu C Aggarwal, "Neural Networks and Deep Learning: A Textbook", 2018)

"The idea behind deeper architectures is that they can better leverage repeated regularities in the data patterns in order to reduce the number of computational units and therefore generalize the learning even to areas of the data space where one does not have examples. Often these repeated regularities are learned by the neural network within the weights as the basis vectors of hierarchical features." (Charu C Aggarwal, "Neural Networks and Deep Learning: A Textbook", 2018)

"A key point is that an increased number of attributes relative to training points provides additional degrees of freedom to the optimization problem, as a result of which irrelevant solutions become more likely. Therefore, a natural solution is to add a penalty for using additional features." (Charu C Aggarwal, "Artificial Intelligence: A Textbook", 2021)

"In general, the more complex the data, the more the analyst has to make prior inferences of what is considered normal for modeling purposes." (Charu C Aggarwal, "Artificial Intelligence: A Textbook", 2021)

"The ability to go beyond human domain knowledge is usually achieved by inductive learning methods that are unfettered from the imperfections in the domain knowledge of deductive methods." (Charu C Aggarwal, "Artificial Intelligence: A Textbook", 2021)

"The Monte Carlo tree search method is naturally suited to non-deterministic settings such as card games or backgammon. Minimax trees are not well suited to non-deterministic settings because of the inability to predict the opponent’s moves while building the tree. On the other hand, Monte Carlo tree search is naturally suited to handling such settings, since the desirability of moves is always evaluated in an expected sense. The randomness in the game can be naturally combined with the randomness in move sampling in order to learn the expected outcomes from each choice of move." (Charu C Aggarwal, "Artificial Intelligence: A Textbook", 2021)

🖍️Umesh R Hodeghatta - Collected Quotes

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Bias occurs normally when the model is underfitted and has failed to learn enough from the training data. It is the difference between the mean of the probability distribution and the actual correct value. Hence, the accuracy of the model is different for different data sets (test and training sets). To reduce the bias error, data scientists repeat the model-building process by resampling the data to obtain better prediction values." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Clustering analysis is performed on data to identify hidden groups or to form different sectors. The objective of the clusters is to enable meaningful analysis in ways that help business. Clustering can uncover previously undetected relationships in a data set." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Correlation explains the extent of change in one of the variables given the unit change in the value of another variable. Correlation assumes a very significant role in statistics and hence in the field of business analytics as any business cannot make any decision without understanding the relationship between various forces acting in favor of or against it." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Graphs represent data visually and provide more details about the data, enabling you to identify outliers in the data, distribute data for each column variable, provide a statistical description of the data, and present the relationship between the two or more variables." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"If either bias or variance is high, the model can be very far off from reality. In general, there is a trade-off between bias and variance. The goal of any machine-learning algorithm is to achieve low bias and low variance such that it gives good prediction performance. In reality, because of so many other hidden parameters in the model, it is hard to calculate the real bias and variance error. Nevertheless, the bias and variance provide a measure to understand the behavior of the machine-learning algorithm so that the model model can be adjusted to provide good prediction performance." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"In machine learning, a model is defined as a function, and we describe the learning function from the training data as inductive learning. Generalization refers to how well the concepts are learned by the model by applying them to data not seen before. The goal of a good machine-learning model is to reduce generalization errors and thus make good predictions on data that the model has never seen." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Machine learning is about making computers learn and perform tasks better based on past historical data. Learning is always based on observations from the data available. The emphasis is on making computers build mathematical models based on that learning and perform tasks automatically without the intervention of humans." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Overfitting and underfitting are two important factors that could impact the performance of machine-learning models. Overfitting occurs when the model performs well with training data and poorly with test data. Underfitting occurs when the model is so simple that it performs poorly with both training and test data. [...]  When the model does not capture and fit the data, it results in poor performance. We call this underfitting. Underfitting is the result of a poor model that typically does not perform well for any data." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Variance is a prediction error due to different sets of training samples. Ideally, the error should not vary from one training sample to another sample, and the model should be stable enough to handle hidden variations between input and output variables. Normally this occurs with the overfitted model." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

🖍️George B Dantzig - Collected Quotes

"All such problems can be formulated as mathematical programming problems. Naturally, we can propose many sophisticated algorithms and a theory but the final test of a theory is its capacity to solve the problems which originated it." (George B Dantzig, "Linear Programming and Extensions", 1963)

"If the system exhibits a structure which can be represented by a mathematical equivalent, called a mathematical model, and if the objective can be also so quantified, then some computational method may be evolved for choosing the best schedule of actions among alternatives. Such use of mathematical models is termed mathematical programming." (George B Dantzig, "Linear Programming and Extensions", 1963)

"Linear programming is viewed as a revolutionary development giving man the ability to state general objectives and to find, by means of the simplex method, optimal policy decisions for a broad class of practical decision problems of great complexity. In the real world, planning tends to be ad hoc because of the many special-interest groups with their multiple objectives." (George B Dantzig, "Mathematical Programming: The state of the art", 1983)

"Linear programming and its generalization, mathematical programming, can be viewed as part of a great revolutionary development that has given mankind the ability to state general goals and lay out a path of detailed decisions to be taken in order to 'best' achieve these goals when faced with practical situations of great complexity. The tools for accomplishing this are the models that formulate real-world problems in detailed mathematical terms, the algorithms that solve the models, and the software that execute the algorithms on computers based on the mathematical theory." (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

"Linear programming is concerned with the maximization or minimization of a linear objective function in many variables subject to linear equality and inequality constraints."  (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

"Mathematical programming (or optimization theory) is that branch of mathematics dealing with techniques for maximizing or minimizing an objective function subject to linear, nonlinear, and integer constraints on the variables."  (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

"Models of the real world are not always easy to formulate because of the richness, variety, and ambiguity that exists in the real world or because of our ambiguous understanding of it." (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

"The linear programming problem is to determine the values of the variables of the system that (a) are nonnegative or satisfy certain bounds, (b) satisfy a system  of linear constraints, and (c) minimize or maximize a linear form in the variables called an objective." (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

"The mathematical model of a system is the collection of mathematical relationships which, for the purpose of developing a design or plan, characterize the set of feasible solutions of the system." (George B Dantzig & Mukund N Thapa, "Linear Programming" Vol I, 1997)

17 April 2006

🖍️Benjamin Bengfort - Collected Quotes

"Deep learning broadly describes the large family of neural network architectures that contain multiple, interacting hidden layers." (Benjamin Bengfort et al, Applied Text Analysis with Python, 2018)

"Graphs can embed complex semantic representations in a compact form. As such, modeling data as networks of related entities is a powerful mechanism for analytics, both for visual analyses and machine learning. Part of this power comes from performance advantages of using a graph data structure, and the other part comes from an inherent human ability to intuitively interact with small networks." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"In essence, deep learning models are just chains of functions, which means that many deep learning libraries tend to have a functional or verbose, declarative style." (Benjamin Bengfort et al, Applied Text Analysis with Python, 2018)

"Language is unstructured data that has been produced by people to be understood by other people. By contrast, structured or semistructured data includes fields or markup that enable it to be easily parsed by a computer. However, while it does not feature an easily machine-readable structure, unstructured data is not random. On the contrary, it is governed by linguistic properties that make it very understandable to other people." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Machine learning is often associated with the automation of decision making, but in practice, the process of constructing a predictive model generally requires a human in the loop. While computers are good at fast, accurate numerical computation, humans are instinctively and instantly able to identify patterns. The bridge between these two necessary skill sets lies in visualization - the precise and accurate rendering of data by a computer in visual terms and the immediate assignation of meaning to that data by humans." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Many model families suffer from 'the curse of dimensionality'; as the feature space increases in dimensions, the data becomes more sparse and less informative to the underlying decision space." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Neural networks refer to a family of models that are defined by an input layer (a vectorized representation of input data), a hidden layer that consists of neurons and synapses, and an output layer with the predicted values. Within the hidden layer, synapses transmit signals between neurons, which rely on an activation function to buffer incoming signals. The synapses apply weights to incoming values, and the activation function determines if the weighted inputs are sufficiently high to activate the neuron and pass the values on to the next layer of the network." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The current trade-offs between traditional models and neural networks concern two factors: model complexity and speed. Because neural networks tend to take longer to train, they can impede rapid iteration [...] Neural networks are also typically more complex than traditional models, meaning that their hyperparameters are more difficult to tune and modeling errors are more challenging to diagnose. However, neural networks are not only increasingly practical, they also promise nontrivial performance gains over traditional models. This is because unlike traditional models, which face performance plateaus even as more data become available, neural models continue to improve." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The premise of classification is simple: given a categorical target variable, learn patterns that exist between instances composed of independent variables and their relationship to the target. Because the target is given ahead of time, classification is said to be supervised machine learning because a model can be trained to minimize error between predicted and actual categories in the training data. Once a classification model is fit, it assigns categorical labels to new instances based on the patterns detected during training." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The trick is to walk the line between underfitting and overfitting. An underfit model has low variance, generally making the same predictions every time, but with extremely high bias, because the model deviates from the correct answer by a significant amount. Underfitting is symptomatic of not having enough data points, or not training a complex enough model. An overfit model, on the other hand, has memorized the training data and is completely accurate on data it has seen before, but varies widely on unseen data. Neither an overfit nor underfit model is generalizable - that is, able to make meaningful predictions on unseen data." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"There is a trade-off between bias and variance [...]. Complexity increases with the number of features, parameters, depth, training epochs, etc. As complexity increases and the model overfits, the error on the training data decreases, but the error on test data increases, meaning that the model is less generalizable." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Unfortunately, because the search space is large, automatic techniques for optimization are not sufficient. Instead, the process of selecting an optimal model is complex and iterative, involving repeated cycling through feature engineering, model selection, and hyperparameter tuning. Results are evaluated after each iteration in order to arrive at the best combination of features, model, and parameters that will solve the problem at hand." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Unsupervised learning or clustering is a way of discovering hidden structures in unlabeled data. Clustering algorithms aim to discover latent patterns in unlabeled data using features to organize instances into meaningfully dissimilar groups." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

🖍️Gary Smith - Collected Quotes

"A computer makes calculations quickly and correctly, but doesn’t ask if the calculations are meaningful or sensible. A computer just does what it is told." (Gary Smith, "Standard Deviations", 2014)

"A study that leaves out data is waving a big red flag. A decision to include orxclude data sometimes makes all the difference in the world. This decision should be based on the relevance and quality of the data, not on whether the data support or undermine a conclusion that is expected or desired." (Gary Smith, "Standard Deviations", 2014)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Comparisons are the lifeblood of empirical studies. We can’t determine if a medicine, treatment, policy, or strategy is effective unless we compare it to some alternative. But watch out for superficial comparisons: comparisons of percentage changes in big numbers and small numbers, comparisons of things that have nothing in common except that they increase over time, comparisons of irrelevant data. All of these are like comparing apples to prunes." (Gary Smith, "Standard Deviations", 2014)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Data without theory can fuel a speculative stock market bubble or create the illusion of a bubble where there is none. How do we tell the difference between a real bubble and a false alarm? You know the answer: we need a theory. Data are not enough. […] Data without theory is alluring, but misleading." (Gary Smith, "Standard Deviations", 2014)

"Don’t just do the calculations. Use common sense to see whether you are answering the correct question, the assumptions are reasonable, and the results are plausible. If a statistical argument doesn’t make sense, think about it carefully - you may discover that the argument is nonsense." (Gary Smith, "Standard Deviations", 2014)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"Graphs should not be mere decoration, to amuse the easily bored. A useful graph displays data accurately and coherently, and helps us understand the data. Chartjunk, in contrast, distracts, confuses, and annoys. Chartjunk may be well-intentioned, but it is misguided. It may also be a deliberate attempt to mystify." (Gary Smith, "Standard Deviations", 2014)

"How can we tell the difference between a good theory and quackery? There are two effective antidotes: common sense and fresh data. If it is a ridiculous theory, we shouldn’t be persuaded by anything less than overwhelming evidence, and even then be skeptical. Extraordinary claims require extraordinary evidence. Unfortunately, common sense is an uncommon commodity these days, and many silly theories have been seriously promoted by honest researchers." (Gary Smith, "Standard Deviations", 2014)

"If somebody ransacks data to find a pattern, we still need a theory that makes sense. On the other hand, a theory is just a theory until it is tested with persuasive data." (Gary Smith, "Standard Deviations", 2014)

 "[…] many gamblers believe in the fallacious law of averages because they are eager to find a profitable pattern in the chaos created by random chance." (Gary Smith, "Standard Deviations", 2014)

"Numbers are not inherently tedious. They can be illuminating, fascinating, even entertaining. The trouble starts when we decide that it is more important for a graph to be artistic than informative." (Gary Smith, "Standard Deviations", 2014)

"Provocative assertions are provocative precisely because they are counterintuitive - which is a very good reason for skepticism." (Gary Smith, "Standard Deviations", 2014)

"Remember that even random coin flips can yield striking, even stunning, patterns that mean nothing at all. When someone shows you a pattern, no matter how impressive the person’s credentials, consider the possibility that the pattern is just a coincidence. Ask why, not what. No matter what the pattern, the question is: Why should we expect to find this pattern?" (Gary Smith, "Standard Deviations", 2014)

"Self-selection bias occurs when people choose to be in the data - for example, when people choose to go to college, marry, or have children. […] Self-selection bias is pervasive in 'observational data', where we collect data by observing what people do. Because these people chose to do what they are doing, their choices may reflect who they are. This self-selection bias could be avoided with a controlled experiment in which people are randomly assigned to groups and told what to do." (Gary Smith, "Standard Deviations", 2014)

"The omission of zero magnifies the ups and downs in the data, allowing us to detect changes that might otherwise be ambiguous. However, once zero has been omitted, the graph is no longer an accurate guide to the magnitude of the changes. Instead, we need to look at the actual numbers." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"We are genetically predisposed to look for patterns and to believe that the patterns we observe are meaningful. […] Don’t be fooled into thinking that a pattern is proof. We need a logical, persuasive explanation and we need to test the explanation with fresh data." (Gary Smith, "Standard Deviations", 2014)

"We are hardwired to make sense of the world around us - to notice patterns and invent theories to explain these patterns. We underestimate how easily pat - terns can be created by inexplicable random events - by good luck and bad luck." (Gary Smith, "Standard Deviations", 2014)

"We are seduced by patterns and we want explanations for these patterns. When we see a string of successes, we think that a hot hand has made success more likely. If we see a string of failures, we think a cold hand has made failure more likely. It is easy to dismiss such theories when they involve coin flips, but it is not so easy with humans. We surely have emotions and ailments that can cause our abilities to go up and down. The question is whether these fluctuations are important or trivial." (Gary Smith, "Standard Deviations", 2014)

"We naturally draw conclusions from what we see […]. We should also think about what we do not see […]. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward." (Gary Smith, "Standard Deviations", 2014)

"We encounter regression in many contexts - pretty much whenever we see an imperfect measure of what we are trying to measure. Standardized tests are obviously an imperfect measure of ability." (Gary Smith, "Standard Deviations", 2014)

"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers." (Gary Smith, "Standard Deviations", 2014)

"[...] a mathematically elegant procedure can generate worthless predictions. Principal components regression is just the tip of the mathematical iceberg that can sink models used by well-intentioned data scientists. Good data scientists think about their tools before they use them." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"A neural-network algorithm is simply a statistical procedure for classifying inputs (such as numbers, words, pixels, or sound waves) so that these data can mapped into outputs. The process of training a neural-network model is advertised as machine learning, suggesting that neural networks function like the human mind, but neural networks estimate coefficients like other data-mining algorithms, by finding the values for which the model’s predictions are closest to the observed values, with no consideration of what is being modeled or whether the coefficients are sensible." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Clowns fool themselves. Scientists don’t. Often, the easiest way to differentiate a data clown from a data scientist is to track the successes and failures of their predictions. Clowns avoid experimentation out of fear that they’re wrong, or wait until after seeing the data before stating what they expected to find. Scientists share their theories, question their assumptions, and seek opportunities to run experiments that will verify or contradict them. Most new theories are not correct and will not be supported by experiments. Scientists are comfortable with that reality and don’t try to ram a square peg in a round hole by torturing data or mangling theories. They know that science works, but only if it’s done right." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Data-mining tools, in general, tend to be mathematically sophisticated, yet often make implausible assumptions. Too often, the assumptions are hidden in the math and the people who use the tools are more impressed by the math than curious about the assumptions. Instead of being blinded by math, good data scientists use assumptions and models that make sense. Good data scientists use math, but do not worship it. They know that math is an invaluable tool, but it is not a substitute for common sense, wisdom, or expertise." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Deep neural networks have an input layer and an output layer. In between, are “hidden layers” that process the input data by adjusting various weights in order to make the output correspond closely to what is being predicted. [...] The mysterious part is not the fancy words, but that no one truly understands how the pattern recognition inside those hidden layers works. That’s why they’re called 'hidden'. They are an inscrutable black box - which is okay if you believe that computers are smarter than humans, but troubling otherwise." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Effective data scientists know that they are trying to convey accurate information in an easily understood way. We have never seen a pie chart that was an improvement over a simple table. Even worse, the creative addition of pictures, colors, shading, blots, and splotches may produce chartjunk that confuses the reader and strains the eyes." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists are careful when they compare samples of different sizes. It is easier for small groups to be lucky. It’s also easier for small groups to be unlucky." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists consider the reliability of the data, while data clowns don’t. It’s also important to know if there are unreported 'silent data'. If something is surprising about top-ranked groups, ask to see the bottom-ranked groups. Consider the possibility of survivorship bias and self-selection bias. Incomplete, inaccurate, or unreliable data can make fools out of anyone." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists do not cherry pick data by excluding data that do not support their claims. One of the most bitter criticisms of statisticians is that, 'Figures don’t lie, but liars figure.' An unscrupulous statistician can prove most anything by carefully choosing favorable data and ignoring conflicting evidence." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists know that, because of inevitable ups and downs in the data for almost any interesting question, they shouldn’t draw conclusions from small samples, where flukes might look like evidence." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists know that some predictions are inherently difficult and we should not expect anything close to 100 percent accuracy. It is better to construct a reasonable model and acknowledge its uncertainty than to expect the impossible." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Good data scientists know that they need to get the assumptions right. It is not enough to have fancy math. Clever math with preposterous premises can be disastrous. [...] Good data scientists think about what they are modeling before making assumptions." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"In addition to overfitting the data by sifting through a kitchen sink of variables, data scientists can overfit the data by trying a wide variety of nonlinear models." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"It is certainly good data science practice to set aside data to test models. However, suppose that we data mine lots of useless models, and test them all on set-aside data. Just as some useless models are certain to fit the original data, some, by luck alone, are certain to fit the set-aside data too. Finding a model that fits both the original data and the set-aside data is just another form of data mining. Instead of discovering a model that fits half the data, we discover a model that fits all the data. That makes the problem less likely, but doesn’t solve it." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"It is tempting to think that because computers can do some things extremely well, they must be highly intelligent, but being useful for specific tasks is very different from having a general intelligence that applies the lessons learned and the skills required for one task to more complex tasks, or completely different tasks." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Machines do not know which features to ignore and which to focus on, since that requires real knowledge of the real world. In the absence of such knowledge, computers focus on idiosyncrasies in the data that maximize their success with the training data, without considering whether these idiosyncrasies are useful for making predictions with fresh data. Because they don’t truly understand Real-World, computers cannot distinguish between the meaningful and the meaningless." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Mathematicians love math and many non-mathematicians are intimidated by math. This is a lethal combination that can lead to the creation of wildly unrealistic mathematical models. [...] A good mathematical model starts with plausible assumptions and then uses mathematics to derive the implications. A bad model focuses on the math and makes whatever assumptions are needed to facilitate the math." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Monte Carlo simulations handle uncertainty by using a computer’s random number generator to determine outcomes. Done over and over again, the simulations show the distribution of the possible outcomes. [...] The beauty of these Monte Carlo simulations is that they allow users to see the probabilistic consequences of their decisions, so that they can make informed choices. [...] Monte Carlo simulations are one of the most valuable applications of data science because they can be used to analyze virtually any uncertain situation where we are able to specify the nature of the uncertainty [...]" (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Neural-network algorithms do not know what they are manipulating, do not understand their results, and have no way of knowing whether the patterns they uncover are meaningful or coincidental. Nor do the programmers who write the code know exactly how they work and whether the results should be trusted. Deep neural networks are also fragile, meaning that they are sensitive to small changes and can be fooled easily." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"One of the paradoxical things about computers is that they can excel at things that humans consider difficult (like calculating square roots) while failing at things that humans consider easy (like recognizing stop signs). They do not understand the world the way humans do. They have neither common sense nor wisdom. They are our tools, not our masters. Good data scientists know that data analysis still requires expert knowledge." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Outliers are sometimes clerical errors, measurement errors, or flukes that, if not corrected or omitted, will distort the data. At other times, they are the most important observations. Either way, good data scientists look at their data before analyzing them." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Regression toward the mean is NOT the fallacious law of averages, also known as the gambler’s fallacy. The fallacious law of averages says that things must balance out - that making a free throw makes a player more likely to miss the next shot; a coin flip that lands heads makes tails more likely on the next flip; and good luck now makes bad luck more likely in the future." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Statistical correlations are a poor substitute for expertise. The best way to build models of the real world is to start with theories that are appealing and then test these models. Models that make sense can be used to make useful predictions." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The binomial distribution applies to things like coin flips, where every flip has the same constant probability of occurring. Jay saw several problems. [...] the binomial distribution assumes that the outcomes are independent, the way that a coin flip doesn’t depend on previous flips. [...] The binomial distribution is elegant mathematics, but it should be used when its assumptions are true, not because the math is elegant." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The label neural networks suggests that these algorithms replicate the neural networks in human brains that connect electrically excitable cells called neurons. They don’t. We have barely scratched the surface in trying to figure out how neurons receive, store, and process information, so we cannot conceivably mimic them with computers."  (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The logic of regression is simple, but powerful. Our lives are filled with uncertainties. The difference between what we expect to happen and what actually does happen is, by definition, unexpected. We can call these unexpected surprises chance, luck, or some other convenient shorthand. The important point is that, no matter how reasonable or rational our expectations, things sometimes turn out to be higher or lower, larger or smaller, stronger or weaker than expected." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The plausibility of the assumptions is more important than the accuracy of the math. There is a well-known saying about data analysis: 'Garbage in, garbage out.' No matter how impeccable the statistical analysis, bad data will yield useless output. The same is true of mathematical models that are used to make predictions. If the assumptions are wrong, the predictions are worthless." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The principle behind regression toward the mean is that extraordinary performances exaggerate how far the underlying trait is from average. [...] Regression toward the mean also works for the worst performers. [...] Regression toward the mean is a purely statistical phenomenon that has nothing at all to do with ability improving or deteriorating over time." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"Useful data analysis requires good data. [...] Good data scientists also consider the reliability of their data. [...] If the data tell you something crazy, there’s a good chance you would be crazy to believe the data." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

16 April 2006

🖍️Danish Haroon - Collected Quotes

"Boosting defines an objective function to measure the performance of a model given a certain set of parameters. The objective function contains two parts: regularization and training loss, both of which add to one another. The training loss measures how predictive our model is on the training data. The most commonly used training loss function includes mean squared error and logistic regression. The regularization term controls the complexity of the model, which helps avoid overfitting." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Boosting is a non-linear flexible regression technique that helps increase the accuracy of trees by assigning more weights to wrong predictions. The reason for inducing more weight is so the model can emphasize more on these wrongly predicted samples and tune itself to increase accuracy. The gradient boosting method solves the inherent problem in boosting trees (i.e., low speed and human interpretability). The algorithm supports parallelism by specifying the number of threads." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Cluster analysis refers to the grouping of observations so that the objects within each cluster share similar properties, and properties of all clusters are independent of each other. Cluster algorithms usually optimize by maximizing the distance among clusters and minimizing the distance between objects in a cluster. Cluster analysis does not complete in a single iteration but goes through several iterations until the model converges. Model convergence means that the cluster memberships of all objects converge and don’t change with every new iteration." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"In Boosting, the selection of samples is done by giving more and more weight to hard-to-classify observations. Gradient boosting classification produces a prediction model in the form of an ensemble of weak predictive models, usually decision trees. It generalizes the model by optimizing for the arbitrary differentiable loss function. At each stage, regression trees fit on the negative gradient of binomial or multinomial deviance loss function." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Multicollinearity and Singularity are two concepts which undermines the regression modeling, resulting in bizarre and inaccurate results. If exploratory variables are highly correlated, then regression becomes vulnerable to biases. Multicollinearity refers to a correlation of 0.9 or higher, whereas singularity refers to a perfect correlation (i.e., 1)." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Multivariate analysis refers to incorporation of multiple exploratory variables to understand the behavior of a response variable. This seems to be the most feasible and realistic approach considering the fact that entities within this world are usually interconnected. Thus the variability in response variable might be affected by the variability in the interconnected exploratory variables." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Null hypothesis is something we attempt to find evidence against in the hypothesis tests. Null hypothesis is usually an initial claim that researchers make on the basis of previous knowledge or experience. Alternative hypothesis has a population parameter value different from that of null hypothesis. Alternative hypothesis is something you hope to come out to be true. Statistical tests are performed to decide which of these holds true in a hypothesis test. If the experiment goes in favor of the null hypothesis then we say the experiment has failed in rejecting the null hypothesis." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Overfitting refers to the phenomenon where a model is highly fitted on a dataset. This generalization thus deprives the model from making highly accurate predictions about unseen data. [...] Underfitting is a phenomenon where the model is not trained with high precision on data at hand. The treatment of underfitting is subject to bias and variance. A model will have a high bias if both train and test errors are high [...] If a model has a high bias type underfitting, then the remedy can be to increase the model complexity, and if a model is suffering from high variance type underfitting, then the cure can be to bring in more data or otherwise make the model less complex." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Regression describes the relationship between an exploratory variable (i.e., independent) and a response variable (i.e., dependent). Exploratory variables are also referred to as predictors and can have a frequency of more than 1. Regression is being used within the realm of predictions and forecasting. Regression determines the change in response variable when one exploratory variable is varied while the other independent variables are kept constant. This is done to understand the relationship that each of those exploratory variables exhibits." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

🖍️Galit Shmueli - Collected Quotes

"Extreme values are values that are unusually large or small compared to other values in the series. Extreme va- lue can affect different forecasting methods to various degrees. The decision whether to remove an extreme value or not must rely on information beyond the data. Is the extreme value the result of a data entry error? Was it due to an unusual event (such as an earthquake) that is unlikely to occur again in the forecast horizon? If there is no grounded justification to remove or replace the extreme value, then the best practice is to generate two sets of forecasts: those based on the series with the extreme values and those based on the series excluding the extreme values." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"For the purpose of choosing adequate forecasting methods, it is useful to dissect a time series into a systematic part and a non-systematic part. The systematic part is typically divided into three components: level , trend , and seasonality. The non-systematic part is called noise. The systematic components are assumed to be unobservable, as they characterize the underlying series, which we only observe with added noise." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"Forecasting methods attempt to isolate the systematic part and quantify the noise level. The systematic part is used for generating point forecasts and the level of noise helps assess the uncertainty associated with the point forecasts." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"Missing values in a time series create "holes" in the series. The presence of missing values has different implications and requires different action depending on the forecasting method." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"[…] noise is the random variation that results from measurement error or other causes not accounted for. It is always present in a time series to some degree, although we cannot observe it directly." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"Some forecasting methods directly model these components by making assumptions about their structure. For example, a popular assumption about trend is that it is linear or exponential over parts, or all, of the given time period. Another common assumption is about the noise structure: many statistical methods assume that the noise follows a normal distribution. The advantage of methods that rely on such assumptions is that when the assumptions are reasonably met, the resulting forecasts will be more robust and the models more understandable." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"Overfitting means that the model is not only fitting the systematic component of the data, but also the noise. An over-fitted model is therefore likely to perform poorly on new data." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"Understanding how performance is evaluated affects the choice of forecasting method, as well as the particular details of how a particular forecasting method is executed." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

"When the purpose of forecasting is to generate accurate forecasts, it is useful to define performance metrics that measure predictive accuracy. Such metrics can tell us how well a particular method performs in general, as well as compared to benchmarks or forecasts from other methods." (Galit Shmueli, "Practical Time Series Forecasting: A Hands-On Guide", 2011)

15 April 2006

🖍️Gerd Gigerenzer - Collected Quotes

"Good numeric representation is a key to effective thinking that is not limited to understanding risks. Natural languages show the traces of various attempts at finding a proper representation of numbers. [...] The key role of representation in thinking is often downplayed because of an ideal of rationality that dictates that whenever two statements are mathematically or logically the same, representing them in different forms should not matter. Evidence that it does matter is regarded as a sign of human irrationality. This view ignores the fact that finding a good representation is an indispensable part of problem solving and that playing with different representations is a tool of creative thinking." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Ignorance of relevant risks and miscommunication of those risks are two aspects of innumeracy. A third aspect of innumeracy concerns the problem of drawing incorrect inferences from statistics. This third type of innumeracy occurs when inferences go wrong because they are clouded by certain risk representations. Such clouded thinking becomes possible only once the risks have been communicated." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"In my view, the problem of innumeracy is not essentially 'inside' our minds as some have argued, allegedly because the innate architecture of our minds has not evolved to deal with uncertainties. Instead, I suggest that innumeracy can be traced to external representations of uncertainties that do not match our mind’s design - just as the breakdown of color constancy can be traced to artificial illumination. This argument applies to the two kinds of innumeracy that involve numbers: miscommunication of risks and clouded thinking. The treatment for these ills is to restore the external representation of uncertainties to a form that the human mind is adapted to." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Information needs representation. The idea that it is possible to communicate information in a 'pure' form is fiction. Successful risk communication requires intuitively clear representations. Playing with representations can help us not only to understand numbers (describe phenomena) but also to draw conclusions from numbers (make inferences). There is no single best representation, because what is needed always depends on the minds that are doing the communicating." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Natural frequencies facilitate inferences made on the basis of numerical information. The representation does part of the reasoning, taking care of the multiplication the mind would have to perform if given probabilities. In this sense, insight can come from outside the mind." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Overcoming innumeracy is like completing a three-step program to statistical literacy. The first step is to defeat the illusion of certainty. The second step is to learn about the actual risks of relevant events and actions. The third step is to communicate the risks in an understandable way and to draw inferences without falling prey to clouded thinking. The general point is this: Innumeracy does not simply reside in our minds but in the representations of risk that we choose." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Statistical innumeracy is the inability to think with numbers that represent uncertainties. Ignorance of risk, miscommunication of risk, and clouded thinking are forms of innumeracy. Like illiteracy, innumeracy is curable. Innumeracy is not simply a mental defect 'inside' an unfortunate mind, but is in part produced by inadequate 'outside' representations of numbers. Innumeracy can be cured from the outside." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"The creation of certainty seems to be a fundamental tendency of human minds. The perception of simple visual objects reflects this tendency. At an unconscious level, our perceptual systems automatically transform uncertainty into certainty, as depth ambiguities and depth illusions illustrate." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002) 

"The key role of representation in thinking is often downplayed because of an ideal of rationality that dictates that whenever two statements are mathematically or logically the same, representing them in different forms should not matter. Evidence that it does matter is regarded as a sign of human irrationality. This view ignores the fact that finding a good representation is an indispensable part of problem solving and that playing with different representations is a tool of creative thinking." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"The possibility of translating uncertainties into risks is much more restricted in the propensity view. Propensities are properties of an object, such as the physical symmetry of a die. If a die is constructed to be perfectly symmetrical, then the probability of rolling a six is 1 in 6. The reference to a physical design, mechanism, or trait that determines the risk of an event is the essence of the propensity interpretation of probability. Note how propensity differs from the subjective interpretation: It is not sufficient that someone’s subjective probabilities about the outcomes of a die roll are coherent, that is, that they satisfy the laws of probability. What matters is the die’s design. If the design is not known, there are no probabilities." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"When does an uncertainty qualify as a risk? The answer depends on one’s interpretation of probability, of which there are three major versions: degree of belief, propensity, and frequency. Degrees of belief are sometimes called subjective probabilities. Of the three interpretations of probability, the subjective interpretation is most liberal about expressing uncertainties as quantitative probabilities, that is, risks. Subjective probabilities can be assigned even to unique or novel events." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"When natural frequencies are transformed into conditional probabilities, the base rate information is taken out (this is called normalization). The benefit of this normalization is that the resulting values fall within the uniform range of 0 and 1. The cost, however, is that when drawing inferences from probabilities (as opposed to natural frequencies), one has to put the base rates back in by multiplying the conditional probabilities by their respective base rates." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Why does representing information in terms of natural frequencies rather than probabilities or percentages foster insight? For two reasons. First, computational simplicity: The representation does part of the computation. And second, evolutionary and developmental primacy: Our minds are adapted to natural frequencies." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

🖍️Ray Kurzweil - Collected Quotes

"A primary reason that evolution - of life-forms or technology - speeds up is that it builds on its own increasing order." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"It is in the nature of exponential growth that events develop extremely slowly for extremely long periods of time, but as one glides through the knee of the curve, events erupt at an increasingly furious pace. And that is what we will experience as we enter the twenty-first century." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Neither noise nor information is predictable." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Once a computer achieves human intelligence it will necessarily roar past it." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"Sometimes, a deeper order - a better fit to a purpose - is achieved through simplification rather than further increases in complexity." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"The Law of Accelerating Returns: As order exponentially increases, time exponentially speeds up (that is, the time interval between salient events grows shorter as time passes)." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"The Law of Time and Chaos: In a process, the time interval between salient events (that is, events that change the nature of the process, or significantly affect the future of the process) expands of contracts along with the amount of chaos." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999) 

"Order [...] is information that fits a purpose." (Ray Kurzweil, "The Age of Spiritual Machines: When Computers Exceed Human Intelligence", 1999)

"A key aspect of a probabilistic fractal is that it enables the generation of a great deal of apparent complexity, including extensive varying detail, from a relatively small amount of design information. Biology uses this same principle. Genes supply the design information, but the detail in an organism is vastly greater than the genetic design information."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Although the Singularity has many faces, its most important implication is this: our technology will match and then vastly exceed the refinement and suppleness of what we regard as the best of human traits."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Evolution moves towards greater complexity, greater elegance, greater knowledge, greater intelligence, greater beauty, greater creativity, and greater levels of subtle attributes such as love. […] Of course, even the accelerating growth of evolution never achieves an infinite level, but as it explodes exponentially it certainly moves rapidly in that direction." (Ray Kurzweil, "The Singularity is Near", 2005)

"However, the law of accelerating returns pertains to evolution, which is not a closed system. It takes place amid great chaos and indeed depends on the disorder in its midst, from which it draws its options for diversity. And from these options, an evolutionary process continually prunes its choices to create ever greater order."  (Ray Kurzweil, "The Singularity is Near", 2005)

"'Increasing complexity' on its own is not, however, the ultimate goal or end-product of these evolutionary processes. Evolution results in better answers, not necessarily more complicated ones. Sometimes a superior solution is a simpler one."  (Ray Kurzweil, "The Singularity is Near", 2005)

"Machines can pool their resources, intelligence, and memories. Two machines—or one million machines - can join together to become one and then become separate again. Multiple machines can do both at the same time: become one and separate simultaneously. Humans call this falling in love, but our biological ability to do this is fleeting and unreliable." (Ray Kurzweil, "The Singularity is Near", 2005)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential'.” view'." (Ray Kurzweil, "The Singularity is Near", 2005)

"Order is information that fits a purpose. The measure of order is the measure of how well the information fits the purpose."  (Ray Kurzweil, "The Singularity is Near", 2005)

"The first idea is that human progress is exponential (that is, it expands by repeatedly multiplying by a constant) rather than linear (that is, expanding by repeatedly adding a constant). Linear versus exponential: Linear growth is steady; exponential growth becomes explosive." (Ray Kurzweil, "The Singularity is Near", 2005)

"The Singularity will represent the culmination of the merger of our biological thinking and existence with our technology, resulting in a world that is still human but that transcends our biological roots. There will be no distinction, post-Singularity, between human and machine or between physical and virtual reality. If you wonder what will remain unequivocally human in such a world, it’s simply this quality: ours is the species that inherently seeks to extend its physical and mental reach beyond current limitations." (Ray Kurzweil, "The Singularity is Near", 2005)

"[...] there are no hard problems, only problems that are hard to a certain level of intelligence." (Ray Kurzweil, "The Singularity is Near", 2005)

"When a machine manages to be simultaneously meaningful and surprising in the same rich way, it too compels a mentalistic interpretation. Of course, somewhere behind the scenes, there are programmers who, in principle, have a mechanical interpretation. But even for them, that interpretation loses its grip as the working program fills its memory with details too voluminous for them to grasp."  (Ray Kurzweil, "The Singularity is Near", 2005)

"If understanding language and other phenomena through statistical analysis does not count as true understanding, then humans have no understanding either." (Ray Kurzweil, "How to Create a Mind", 2012)

"The story of evolution unfolds with increasing levels of abstraction." (Ray Kurzweil, "How to Create a Mind", 2012)

14 April 2006

🖍️Ian Goodfellow - Collected Quotes

"Learning theory claims that a machine learning algorithm can generalize well from a finite training set of examples. This seems to contradict some basic principles of logic. Inductive reasoning, or inferring general rules from a limited set of examples, is not logically valid. To logically infer a rule describing every member of a set, one must have information about every member of that set." (Ian Goodfellow et al, "Deep Learning", 2015)

"A representation learning algorithm can discover a good set of features for a simple task in minutes, or for a complex task in hours to months." (Ian Goodfellow et al, "Deep Learning", 2015)

"Another crowning achievement of deep learning is its extension to the domain of reinforcement learning. In the context of reinforcement learning, an autonomous agent must learn to perform a task by trial and error, without any guidance from the human operator." (Ian Goodfellow et al, "Deep Learning", 2015)

"Cognitive science is an interdisciplinary approach to understanding the mind, combining multiple different levels of analysis."

"Logic provides a set of formal rules for determining what propositions are implied to be true or false given the assumption that some other set of propositions is true or false. Probability theory provides a set of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions." (Ian Goodfellow et al, "Deep Learning", 2015)

"The field of deep learning is primarily concerned with how to build computer systems that are able to successfully solve tasks requiring intelligence, while the field of computational neuroscience is primarily concerned with building more accurate models of how the brain actually works." (Ian Goodfellow et al, "Deep Learning", 2015)

"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)

"The no free lunch theorem implies that we must design our machine learning algorithms to perform well on a specific task. We do so by building a set of preferences into the learning algorithm. When these preferences are aligned with the learning problems we ask the algorithm to solve, it performs better." (Ian Goodfellow et al, "Deep Learning", 2015)

🖍️N D Lewis - Collected Quotes

"Deep learning is an area of machine learning that emerged from the intersection of neural networks, artificial intelligence, graphical modeling, optimization, pattern recognition and signal processing." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Overfitting is like attending a concert of your favorite band. Depending on the acoustics of the concert venue you will hear both music and noise from the screams of the crowd to reverberations off walls and so on. Overfitting happens when your model perfectly fits both the music and the noise when the intent is to fit the structure (the music). It is generally a result of the predictor being too complex (recall Occams Razor) so that it fits the underlying structure as well as the noise. The consequence is a small or zero test set classification error. Alas, this super low error rate will fail to materialize on future unseen samples. One consequence of overfitting is poor generalization (prediction) on future data." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Underfitting can also be a problem. It happens when the predictor is too simplistic or rigid to capture the underlying characteristics of the data. In this case the test error will be rather large." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The bias variance decomposition is a useful tool for understanding classifier behavior. It turns out the expected misclassification rate can be decomposed into two components, a reducible error and irreducible error [...] Irreducible error or inherent uncertainty is associated with the natural variability in the phenomenon under study, and is therefore beyond our control. [...] Reducible error, as the name suggests, can be minimized. It can be decomposed into error due to squared bias and error due to variance." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

🖍️Roger J Barlow - Collected Quotes

"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Good programming style is important, not just a luxury. Programs are organic; a good program lives a long time, and has to be changed to meet new uses; a well-written program makes such adaptations easier. A logical, clearly written program helps greatly when tracking down possible bugs. However, although there is a lot of good advice on style laid down by various people, there are no hard and fast rules, only guidelines. It is like writing English in this respect." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"In everyday life, 'estimation' means a rough and imprecise procedure leading to a rough and imprecise result. You 'estimate' when you cannot measure exactly. In statistics, on the other hand, 'estimation' is a technical term. It means a precise and accurate procedure, leading to a result which may be imprecise, but where at least the extent of the imprecision is known. It has nothing to do with approximation. You have some data, from which you want to draw conclusions and produce a 'best' value for some particular numerical quantity (or perhaps for several quantities), and you probably also want to know how reliable this value is, i.e. what the error is on your estimate." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Least squares' means just what it says: you minimise the (suitably weighted) squared difference between a set of measurements and their predicted values. This is done by varying the parameters you want to estimate: the predicted values are adjusted so as to be close to the measurements; squaring the differences means that greater importance is placed on removing the large deviations." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Probabilities are pure numbers. Probability densities, on the other hand, have dimensions, the inverse of those of the variable x to which they apply." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Science is supposed to explain to us what is actually happening, and indeed what will happen, in the world. Unfortunately as soon as you try and do something useful with it, sordid arithmetical numbers start getting in the way and messing up the basic scientific laws." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Statistics is a tool. In experimental science you plan and carry out experiments, and then analyse and interpret the results. To do this you use statistical arguments and calculations. Like any other tool - an oscilloscope, for example, or a spectrometer, or even a humble spanner - you can use it delicately or clumsily, skillfully or ineptly. The more you know about it and understand how it works, the better you will be able to use it and the more useful it will be." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Subjective probability, also known as Bayesian statistics, pushes Bayes' theorem further by applying it to statements of the type described as 'unscientific' in the frequency definition. The probability of a theory (e.g. that it will rain tomorrow or that parity is not violated) is considered to be a subjective 'degree of belief - it can perhaps be measured by seeing what odds the person concerned will offer as a bet. Subsequent experimental evidence then modifies the initial degree of belief, making it stronger or weaker according to whether the results agree or disagree with the predictions of the theory in question." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"The best way to learn to write good programs is to read other people's. Notice whether they are clear or obscure, and try and understand why. Then imitate the good points and avoid the bad in your own programs." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"The principle of maximum likelihood is not a rule that requires justification - it does not need to be proved. It is merely a sensible way of producing an estimator. But although the name 'maximum likelihood' has a nice ring to it - it suggest that your estimate is the 'most likely' value - this is an unfair interpretation; it is the estimate that makes your data most likely - another thing altogether." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"There is a technical difference between a bar chart and a histogram in that the number represented is proportional to the length of bar in the former and the area in the latter. This matters if non-uniform binning is used. Bar charts can be used for qualitative or quantitative data, whereas histograms can only be used for quantitative data, as no meaning can be attached to the width of the bins if the data are qualitative." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989) 

"There is an obvious parallel between an expectation value and the mean of a data sample. The former is a sum over a theoretical probability distribution and the latter is a (similar) sum over a real data sample. The law of large numbers ensures that if a data sample is described by a theoretical distribution, then as N, the size of the data sample, goes to infinity […]." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"When you want to use some data to give the answer to a question, the first step is to formulate the question precisely by expressing it as a hypothesis. Then you consider the consequences of that hypothesis, and choose a suitable test to apply to the data. From the result of the test you accept or reject the hypothesis according to prearranged criteria. This cannot be infallible, and there is always a chance of getting the wrong answer, so you try and reduce the chance of such a mistake to a level which you consider reasonable." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

13 April 2006

🖍️Peter L Bernstein - Collected Quotes

"A normal distribution is most unlikely, although not impossible, when the observations are dependent upon one another - that is, when the probability of one event is determined by a preceding event. The observations will fail to distribute themselves symmetrically around the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"All the law [of large numbers] tells us is that the average of a large number of throws will be more likely than the average of a small number of throws to differ from the true average by less than some stated amount. And there will always be a possibility that the observed result will differ from the true average by a larger amount than the specified bound." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"But real-life situations often require us to measure probability in precisely this fashion - from sample to universe. In only rare cases does life replicate games of chance, for which we can determine the probability of an outcome before an event even occurs - a priori […] . In most instances, we have to estimate probabilities from what happened after the fact - a posteriori. The very notion of a posteriori implies experimentation and changing degrees of belief." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details - in the quality of information that forms the basis of probability estimates." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, the choice is not between rejecting a hypothesis and accepting it, but between reject and not - reject. You can decide that the probability that you are wrong is so small that you should not reject the hypothesis. You can decide that the probability that you are wrong is so large that you should reject the hypothesis. But with any probability short of zero that you are wrong - certainty rather than uncertainty - you cannot accept a hypothesis." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Until we can distinguish between an event that is truly random and an event that is the result of cause and effect, we will never know whether what we see is what we'll get, nor how we got what we got. When we take a risk, we are betting on an outcome that will result from a decision we have made, though we do not know for certain what the outcome will be. The essence of risk management lies in maximizing the areas where we have some control over the outcome while minimizing the areas where we have absolutely no control over the outcome and the linkage between effect and cause is hidden from us." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"We can assemble big pieces of information and little pieces, but we can never get all the pieces together. We never know for sure how good our sample is. That uncertainty is what makes arriving at judgments so difficult and acting on them so risky. […] When information is lacking, we have to fall back on inductive reasoning and try to guess the odds." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Whenever we make any decision based on the expectation that matters will return to 'normal', we are employing the notion of regression to the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

🖍️Phillip I Good - Collected Quotes

"A major problem with many studies is that the population of interest is not adequately defined before the sample is drawn. Don’t make this mistake. A second major source of error is that the sample proves to have been drawn from a different population than was originally envisioned." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A permutation test based on the original observations is appropriate only if one can assume that under the null hypothesis the observations are identically distributed in each of the populations from which the samples are drawn. If we cannot make this assumption, we will need to transform the observations, throwing away some of the information about them so that the distributions of the transformed observations are identical." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A well-formulated hypothesis will be both quantifiable and testable - that is, involve measurable quantities or refer to items that may be assigned to mutually exclusive categories. [...] When the objective of our investigations is to arrive at some sort of conclusion, then we need to have not only a hypothesis in mind, but also one or more potential alternative hypotheses." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Before we initiate data collection, we must have a firm idea of what we will measure. A second fundamental principle is also applicable to both experiments and surveys: Collect exact values whenever possible. Worry about grouping them in interval or discrete categories later. […] You can always group your results (and modify your groupings) after a study is completed. If after-the-fact grouping is a possibility, your design should state how the grouping will be determined; otherwise there will be the suspicion that you chose the grouping to obtain desired results." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Estimation methods should be impartial. Decisions should not depend on the accidental and quite irrelevant labeling of the samples. Nor should decisions depend on the units in which the measurements are made." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Every statistical procedure relies on certain assumptions for correctness. Errors in testing hypotheses come about either because the assumptions underlying the chosen test are not satisfied or because the chosen test is less powerful than other competing procedures."(Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"[…] finding at least one cluster of events in time or in spaceh as a greater probability than finding no clusters at all (equally spaced events)." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"More important than comparing the means of populations can be determining why the variances are different." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Most statistical procedures rely on two fundamental assumptions: that the observations are independent of one another and that they are identically distributed. If your methods of collection fail to honor these assumptions, then your analysis must fail also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Never assign probabilities to the true state of nature, but only to the validity of your own predictions." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The greatest error associated with the use of statistical procedures is to make the assumption that one single statistical methodology can suffice for all applications. […] But one methodology can never be better than another, nor can estimation replace hypothesis testing or vice versa. Every methodology has a proper domain of application and another set of applications for which it fails. Every methodology has its drawbacks and its advantages, its assumptions, and its sources of error." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The sources of error in applying statistical procedures are legion and include all of the following: (•) Using the same set of data both to formulate hypotheses and to test them. (•) Taking samples from the wrong population or failing to specify the population(s) about which inferences are to be made in advance. (•) Failing to draw random, representative samples. (•) Measuring the wrong variables or failing to measure what you’d hoped to measure. (•) Using inappropriate or inefficient statistical methods. (•) Failing to validate models. But perhaps the most serious source of error lies in letting statistical procedures make decisions for you." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in estimation stem from a failure to measure what one wanted to measure or what one thought one was measuring. Misleading definitions, inaccurate measurements, errors in recording and transcription, and confounding variables plague results. To forestall such errors, review your data collection protocols and procedure manuals before you begin, run several preliminary trials, record potential confounding variables, monitor data collection, and review the data as they are collected." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The vast majority of errors in Statistics - and not incidentally, in most human endeavors - arise from a reluctance (or even an inability) to plan. Some demon (or demonic manager) seems to be urging us to cross the street before we’ve had the opportunity to look both ways. Even on those rare occasions when we do design an experiment, we seem more obsessed with the mechanics than with the concepts that underlie it." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Use statistics as a guide to decision making rather than a mandate." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"When we assert that for a given population a percentage of samples will have a specific composition, this also is a deduction. But when we make an inductive generalization about a population based upon our analysis of a sample, we are on shakier ground. It is one thing to assert that if an observation comes from a normal distribution with mean zero, the probability is one-half that it is positive. It is quite another if, on observing that half the observations in the sample are positive, we assert that half of all the possible observations that might be drawn from that population will be positive also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"While a null hypothesis can facilitate statistical inquiry - an exact permutation test is impossible without it - it is never mandated. In any event, virtually any quantifiable hypothesis can be converted into null form. There is no excuse and no need to be content with a meaningless null. […] We must specify our alternatives before we commence an analysis, preferably at the same time we design our study." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.