04 December 2006

Lawrence C Hamilton - Collected Quotes

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Comparing normal distributions reduces to comparing only means and standard deviations. If standard deviations are the same, the task even simpler: just compare means. On the other hand, means and standard deviations may be incomplete or misleading as summaries for nonnormal distributions." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Correlation and covariance are linear regression statistics. Nonlinearity and influential cases cause the same problems for correlations, and hence for principal components/factor analysis, as they do for regression. Scatterplots should be examined routinely to check for nonlinearity and outliers. Diagnostic checks become even more important with maximum-likelihood factor analysis, which makes stronger assumptions and may be less robust than principal components or principal factors." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis typically begins with straight-line models because they are simplest, not because we believe reality is inherently linear. Theory or data may suggest otherwise [...]" (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Exploratory regression methods attempt to reveal unexpected patterns, so they are ideal for a first look at the data. Unlike other regression techniques, they do not require that we specify a particular model beforehand. Thus exploratory techniques warn against mistakenly fitting a linear model when the relation is curved, a waxing curve when the relation is S-shaped, and so forth." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Principal components and factor analysis are methods for data reduction. They seek a few underlying dimensions that account for patterns of variation among the observed variables underlying dimensions imply ways to combine variables, simplifying subsequent analysis. For example, a few combined variables could replace many original variables in a regression. Advantages of this approach include more parsimonious models, improved measurement of indirectly observed concepts, new graphical displays, and the avoidance of multicollinearity." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Principal components and principal factor analysis lack a well-developed theoretical framework like that of least squares regression. They consequently provide no systematic way to test hypotheses about the number of factors to retain, the size of factor loadings, or the correlations between factors, for example. Such tests are possible using a different approach, based on maximum-likelihood estimation." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Remember that normality and symmetry are not the same thing. All normal distributions are symmetrical, but not all symmetrical distributions are normal. With water use we were able to transform the distribution to be approximately symmetrical and normal, but often symmetry is the most we can hope for. For practical purposes, symmetry (with no severe outliers) may be sufficient. Transformations are not a magic wand, however. Many distributions cannot even be made symmetrical." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Visually, skewed sample distributions have one 'longer' and one 'shorter' tail. More general terms are 'heavier' and 'lighter' tails. Tail weight reflects not only distance from the center (tail length) but also the frequency of cases at that distance (tail depth, in a histogram). Tail weight corresponds to actual weight if the sample histogram were cut out of wood and balanced like a seesaw on its median (see next section). A positively skewed distribution is heavier to the right of the median; negative skew implies the opposite." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"A well-constructed graph can show several features of the data at once. Some graphs contain as much information as the original data, and so (unlike numerical summaries) do not actually simplify the data; rather, they express it in visual form. Unexpected or unusual features, which are not obvious within numerical tables, often jump to our attention once we draw a graph. Because the strengths and weaknesses of graphical methods are opposite those of numerical summary methods, the two work best in combination." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Data analysis [...] begins with a dataset in hand. Our purpose in data analysis is to learn what we can from those data, to help us draw conclusions about our broader research questions. Our research questions determine what sort of data we need in the first place, and how we ought to go about collecting them. Unless data collection has been done carefully, even a brilliant analyst may be unable to reach valid conclusions regarding the original research questions." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Variance and its square root, the standard deviation, summarize the amount of spread around the mean, or how much a variable varies. Outliers influence these statistics too, even more than they influence the mean. On the other hand. the variance and standard deviation have important mathematical advantages that make them (together with the mean) the foundation of classical statistics. If a distribution appears reasonably symmetrical, with no extreme outliers, then the mean and standard deviation or variance are the summaries most analysts would use." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.