22 April 2006

Foster Provost - Collected Quotes

"Data mining is a craft. As with many crafts, there is a well-defined process that can help to increase the likelihood of a successful result. This process is a crucial conceptual tool for thinking about data science projects. [...] data mining is an exploratory undertaking closer to research and development than it is to engineering." (Foster Provost, "Data Science for Business", 2013)

"Formulating data mining solutions and evaluating the results involves thinking carefully about the context in which they will be used." (Foster Provost, "Data Science for Business", 2013)

"[…] framing a business problem in terms of expected value can allow us to systematically decompose it into data mining tasks." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"If you look too hard at a set of data, you will find something - but it might not generalize beyond the data you’re looking at. This is referred to as overfitting a dataset. Data mining techniques can be very powerful, and the need to detect and avoid overfitting is one of the most important concepts to grasp when applying data mining to real problems. The concept of overfitting and its avoidance permeates data science processes, algorithms, and evaluation methods." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"In analytics, it’s more important for individuals to be able to formulate problems well, to prototype solutions quickly, to make reasonable assumptions in the face of ill-structured problems, to design experiments that represent good investments, and to analyze results." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"In data science, a predictive model is a formula for estimating the unknown value of interest: the target. The formula could be mathematical, or it could be a logical statement such as a rule. Often it is a hybrid of the two." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"There is another important distinction pertaining to mining data: the difference between (1) mining the data to find patterns and build models, and (2) using the results of data mining. Students often confuse these two processes when studying data science, and managers sometimes confuse them when discussing business analytics. The use of data mining results should influence and inform the data mining process itself, but the two should be kept distinct." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"There is convincing evidence that data-driven decision-making and big data technologies substantially improve business performance. Data science supports data-driven decision-making - and sometimes conducts such decision-making automatically - and depends upon technologies for 'big data' storage and engineering, but its principles are separate." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Unfortunately, creating an objective function that matches the true goal of the data mining is usually impossible, so data scientists often choose based on faith and experience." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.