28 November 2018

Data Science: Classification (Just the Quotes)

"Classification is the process of arranging data into sequences and groups according to their common characteristics, or separating them into different but related parts." (Horace Secrist, "An Introduction to Statistical Methods", 1917)


"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"A classification is a scheme for breaking a category into a set of parts, called classes, according to some precisely defined differing characteristics possessed by all the elements of the category." (Alva M Tuttle, "Elementary Business and Economic Statistics", 1957)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996) 

"While classification is important, it can certainly be overdone. Making too fine a distinction between things can be as serious a problem as not being able to decide at all. Because we have limited storage capacity in our brain (we still haven't figured out how to add an extender card), it is important for us to be able to cluster similar items or things together. Not only is clustering useful from an efficiency standpoint, but the ability to group like things together (called chunking by artificial intelligence practitioners) is a very important reasoning tool. It is through clustering that we can think in terms of higher abstractions, solving broader problems by getting above all of the nitty-gritty details." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The well-known 'No Free Lunch' theorem indicates that there does not exist a pattern classification method that is inherently superior to any other, or even to random guessing without using additional information. It is the type of problem, prior information, and the amount of training samples that determine the form of classifier to apply. In fact, corresponding to different real-world problems, different classes may have different underlying data structures. A classifier should adjust the discriminant boundaries to fit the structures which are vital for classification, especially for the generalization capacity of the classifier." (Hui Xue et al, "SVM: Support Vector Machines", 2009)

"A problem in data mining when random variations in data are misclassified as important patterns. Overfitting often occurs when the data set is too small to represent the real world." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Choosing an appropriate classification algorithm for a particular problem task requires practice: each algorithm has its own quirks and is based on certain assumptions. To restate the 'No Free Lunch' theorem: no single classifier works best across all possible scenarios. In practice, it is always recommended that you compare the performance of at least a handful of different learning algorithms to select the best model for the particular problem; these may differ in the number of features or samples, the amount of noise in a dataset, and whether the classes are linearly separable or not." (Sebastian Raschka, "Python Machine Learning", 2015)

"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Decision trees are important for a few reasons. First, they can both classify and regress. It requires literally one line of code to switch between the two models just described, from a classification to a regression. Second, they are able to determine and share the feature importance of a given training set." (Russell Jurney, "Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark", 2017)

"Multilayer perceptrons share with polynomial classifiers one unpleasant property. Theoretically speaking, they are capable of modeling any decision surface, and this makes them prone to overfitting the training data."  (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

 "The main reason why pruning tends to improve classification performance on future examples is that the removal of low-level tests, which have poor statistical support, usually reduces the danger of overfitting. This, however, works only up to a certain point. If overdone, a very high extent of pruning can (in the extreme) result in the decision being replaced with a single leaf labeled with the majority class." (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The no free lunch theorems set limits on the range of optimality of any method. That is, each methodology has a ‘catchment area’ where it is optimal or nearly so. Often, intuitively, if the optimality is particularly strong then the effectiveness of the methodology falls off more quickly outside its catchment area than if its optimality were not so strong. Boosting is a case in point: it seems so well suited to binary classification that efforts to date to extend it to give effective classification (or regression) more generally have not been very successful. Overall, it remains to characterize the catchment areas where each class of predictors performs optimally, performs generally well, or breaks down." (Bertrand S Clarke & Jennifer L. Clarke, "Predictive Statistics: Analysis and Inference beyond Models", 2018)

"The premise of classification is simple: given a categorical target variable, learn patterns that exist between instances composed of independent variables and their relationship to the target. Because the target is given ahead of time, classification is said to be supervised machine learning because a model can be trained to minimize error between predicted and actual categories in the training data. Once a classification model is fit, it assigns categorical labels to new instances based on the patterns detected during training." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"A classification tree is perhaps the simplest form of algorithm, since it consists of a series of yes/no questions, the answer to each deciding the next question to be asked, until a conclusion is reached." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"An advantage of random forests is that it works with both regression and classification trees so it can be used with targets whose role is binary, nominal, or interval. They are also less prone to overfitting than a single decision tree model. A disadvantage of a random forest is that they generally require more trees to improve their accuracy. This can result in increased run times, particularly when using very large data sets." (Richard V McCarthy et al, "Applying Predictive Analytics: Finding Value in Data", 2019)

"The classifier accuracy would be extra ordinary when the test data and the training data are overlapping. But when the model is applied to a new data it will fail to show acceptable accuracy. This condition is called as overfitting." (Jesu V  Nayahi J & Gokulakrishnan K, "Medical Image Classification", 2019)

More quotes on "Classification" at the-web-of-knowledge.blogspot.com

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.