20 November 2018

Data Science: Overfitting (Definitions)

"This is when a predictive model is trained to a point where it is unable to generalize outside the training set of examples it was built from." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A condition that occurs when there are too many parameters in a model. In such cases, the model learns the idiosyncrasies of the test data set. This can happen in models such as regression, time series analysis, and neural networks." (Chandra S Amaravadi, "Exploiting the Strategic Potential of Data Mining", 2009)

"The situation that occurs when an algorithm has too many parameters or is run for too long and fits the noise as well as the signal. Overfit models become too complex for the problem or the available quantity of data." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Learning a complicated function that matches the training data closely but fails to recognize the underlying process that generates the data. As a result of overfitting, the model performs poor on new input. Overfitting occurs when the training patterns are sparse in input space and/or the trained networks are too complex." (Frank Padberg, "Counting the Hidden Defects in Software Documents", 2010)

"A problem in data mining when random variations in data are misclassified as important patterns. Overfitting often occurs when the data set is too small to represent the real world." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Overfitting occurs when a formula describes a set of data very closely, but does not lead to any sensible explanation for the behavior of the data and does not predict the behavior of comparable data sets. In the case of overfitting, the formula is said to describe the noise of the system rather than the characteristic behavior of the system. Overfitting occurs frequently with models that perform iterative approximations on training data, coming closer and closer to the training data set with each iteration. Neural networks are an example of a data modeling strategy that is prone to overfitting." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"Occurs when a model accurately describes the data used for training, but which produces errors or makes poor predictions when applied to other data samples." (Meta S Brown, "Data Mining For Dummies", 2014)

"The classifier accuracy would be extra ordinary when the test data and the training data are overlapping. But when the model is applied to a new data it will fail to show acceptable accuracy. This condition is called as overfitting." (Jesu V  Nayahi J & Gokulakrishnan K, "Medical Image Classification", 2019)

"In machine learning, our data has biases as well as useful information for our task. The more exactly our machine learning model fits the data, the more it reflects these biases. This means that the predictions may be based on spurious relationships that incidentally occur in the training data." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A condition when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data." (Salma P Z & Maya Mohan, "Detection and Prediction of Spam Emails Using Machine Learning Models", 2021)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.