01 December 2018

Data Science: Iterations (Just the Quotes)

"Data analysis must be iterative to be effective. [...] The iterative and interactive interplay of summarizing by fit and exposing by residuals is vital to effective data analysis. Summarizing and exposing are complementary and pervasive." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"Iteration and experimentation are important for all of data analysis, including graphical data display. In many cases when we make a graph it is immediately clear that some aspect is inadequate and we regraph the data. In many other cases we make a graph, and all is well, but we get an idea for studying the data in a different way with a different graph; one successful graph often suggests another." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Apart from power laws, iteration is one of the prime sources of self-similarity. Iteration here means the repeated application of some rule or operation - doing the same thing over and over again. […] A concept closely related to iteration is recursion. In an age of increasing automation and computation, many processes and calculations are recursive, and if a recursive algorithm is in fact repetitious, self-similarity is waiting in the wings."(Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990)

"Fitting is essential to visualizing hypervariate data. The structure of data in many dimensions can be exceedingly complex. The visualization of a fit to hypervariate data, by reducing the amount of noise, can often lead to more insight. The fit is a hypervariate surface, a function of three or more variables. As with bivariate and trivariate data, our fitting tools are loess and parametric fitting by least-squares. And each tool can employ bisquare iterations to produce robust estimates when outliers or other forms of leptokurtosis are present." (William S Cleveland, "Visualizing Data", 1993)

"Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: 'there’s a lot of data, what can you make from it?'" (Mike Loukides, "What Is Data Science?", 2011)

"Overfitting occurs when a formula describes a set of data very closely, but does not lead to any sensible explanation for the behavior of the data and does not predict the behavior of comparable data sets. In the case of overfitting, the formula is said to describe the noise of the system rather than the characteristic behavior of the system. Overfitting occurs frequently with models that perform iterative approximations on training data, coming closer and closer to the training data set with each iteration. Neural networks are an example of a data modeling strategy that is prone to overfitting." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"Geometric pattern repeated at progressively smaller scales, where each iteration is about a reproduction of the image to produce completely irregular shapes and surfaces that can not be represented by classical geometry. Fractals are generally self-similar (each section looks at all) and are not subordinated to a specific scale. They are used especially in the digital modeling of irregular patterns and structures in nature." (Mauro Chiarella, "Folds and Refolds: Space Generation, Shapes, and Complex Components", 2016)

"Cluster analysis refers to the grouping of observations so that the objects within each cluster share similar properties, and properties of all clusters are independent of each other. Cluster algorithms usually optimize by maximizing the distance among clusters and minimizing the distance between objects in a cluster. Cluster analysis does not complete in a single iteration but goes through several iterations until the model converges. Model convergence means that the cluster memberships of all objects converge and don’t change with every new iteration." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.