18 April 2006

Leo Breiman - Collected Quotes

"Probability theory has a right and a left hand. On the right is the rigorous foundational work using the tools of measure theory. The left hand 'thinks probabilistically', reduces problems to gambling situations, coin-tossing, motions of a physical particle." (Leo Breiman, "Probability", 1992) 

"Approaching problems by looking for a data model imposes an a priori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems. The best available solution to a data problem might be a data model; then again it might be an algorithmic model. The data and the problem guide the solution. To solve a wider range of data problems, a larger set of tools is needed." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"Data modeling has given the statistics field many successes in analyzing data and getting information about the mechanisms producing the data. But there is also misuse leading to questionable conclusions about the underlying mechanism." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"One goal of statistics is to extract information from the data about the underlying mechanism producing the data. The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"Prediction is rarely perfect. There are usually many unmeasured variables whose effect is referred to as 'noise'. But the extent to which the model box emulates nature's box is a measure of how well our model can reproduce the natural phenomenon producing the data." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"The goals in statistics are to use data to predict and to get information about the underlying data mechanism. Nowhere is it written on a stone tablet what kind of model should be used to solve problems involving data. To make my position clear, I am not against data models per se. In some situations they are the most appropriate way to solve the problem. But the emphasis needs to be on the problem and on the data." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical Modeling: The two cultures", Statistical Science 16(3), 2001)

"The point of a model is to get useful information about the relation between the response and predictor variables. Interpretability is a way of getting information. But a model does not have to be simple to provide reliable information about the relation between predictor and response variables; neither does it have to be a data model." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"The roots of statistics, as in science, lie in working with data and checking theory against data." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.