26 April 2006

Gerald van Belle - Collected Quotes

"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A good graph displays relationships and structures that are difficult to detect by merely looking at the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A probability can frequently be expressed as a ratio of the number of events divided by the number of units eligible for the event. What the rule of thumb says is to be aware of what the numerator and denominator are, particularly when assessing probabilities in a personal situation. If someone never goes hang gliding, they clearly do not need to worry about the probability of dying in a hang gliding accident." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Assess agreement by addressing accuracy, scale differential, and precision. Accuracy can be thought of as the lack of bias." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Before choosing a measure of covariation determine the source of the data (sampling scheme), the nature of the variables, and the symmetry status of the measure." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Characterizing variability requires repeatedly observing the variability since the it is not a property inherent in the observation itself. " (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Displaying numerical information always involves selection. The process of selection needs to be described so that the reader will not be misled." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Distinguish among confidence, prediction, and tolerance intervals. Confidence intervals are statements about population means or other parameters. Prediction intervals address future (single or multiple) observations. Tolerance intervals describe the location of a specific proportion of a population, with specified confidence." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Do not let the scale of measurement rigidly determine the method of analysis." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Everyone agrees that there are degrees of quality of information but when asked to define the criteria there a great deal of disagreement. The simple statistical rule that the inverse of the variance of a statistic is a measure of the information contained in the statistic provides a useful criterion for a point estimate but is clearly inadequate for comparing much bigger chunks of information such as a study." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Every statistical analysis is an interpretation of the data, and missingness affects the interpretation. The challenge is that when the reasons for the missingness cannot be determined there is basically no way to make appropriate statistical adjustments. Sensitivity analyses are designed to model and explore a reasonable range of explanations in order to assess the robustness of the results." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"In assessing change, the spacing of the observations is much more important than the number of observations." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"In using a database, first look at the metadata, then look at the data. [...] The old computer acronym GIGO (Garbage In, Garbage Out) applies to the use of large databases. The issue is whether the data from the database will answer the research question. In order to determine this, the investigator must have some idea about the nature of the data in the database - that is, the metadata." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"It is crucial to have a broad understanding of the subject matter involved. Statistical analysis is much more than just carrying out routine computations. Only with keen understanding of the subject matter can statisticians, and statistics, be most usefully engaged." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Know what properties a transformation preserves and does not preserve." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Models can be viewed and used at three levels. The first is a model that fits the data. A test of goodness-of-fit operates at this level. This level is the least useful but is frequently the one at which statisticians and researchers stop. For example, a test of a linear model is judged good when a quadratic term is not significant. A second level of usefulness is that the model predicts future observations. Such a model has been called a forecast model. This level is often required in screening studies or studies predicting outcomes such as growth rate. A third level is that a model reveals unexpected features of the situation being described, a structural model, [...] However, it does not explain the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Observation is selection. [...] To observe one thing implies that another is not observed, hence there is selection. This implies that the observation is taken from a larger collective, the statistical population." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Ockham's Razor in statistical analysis is used implicitly when models are embedded in richer models -for example, when testing the adequacy of a linear model by incorporating a quadratic term. If the coefficient of the quadratic term is not significant, it is dropped and the linear model is assumed to summarize the data adequately." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Precision does not vary linearly with increasing sample size. As is well known, the width of a confidence interval is a function of the square root of the number of observations. But it is more complicate than that. The basic elements determining a confidence interval are the sample size, an estimate of variability, and a pivotal variable associated with the estimate of variability." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Randomization puts systematic sources of variability into the error term." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Since the analysis of variance is an analysis of variability of means it is possible to plot the means in many ways." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Statistics is the analysis of variation. There are many sources and kinds of variation. In environmental studies it is particularly important to understand the kinds of variation and the implications of the difference. Two important categories are variability and uncertainty. Variability refers to variation in environmental quantities (which may have special regulatory interest), uncertainty refers to the degree of precision with which these quantities are estimated." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The best rule is: Don't have any missing data, Unfortunately, that is unrealistic. Therefore, plan for missing data and develop strategies to account for them. Do this before starting the study. The strategy should state explicitly how the type of missingness will be examined, how it will be handled, and how the sensitivity of the results to the missing data will be assessed." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The bounds on the standard deviation are pretty crude but it is surprising how often the rule will pick up gross errors such as confusing the standard error and standard deviation, confusing the variance and the standard deviation, or reporting the mean in one scale and the standard deviation in another scale." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The content and context of the numerical data determines the most appropriate mode of presentation. A few numbers can be listed, many numbers require a table. Relationships among numbers can be displayed by statistics. However, statistics, of necessity, are summary quantities so they cannot fully display the relationships, so a graph can be used to demonstrate them visually. The attractiveness of the form of the presentation is determined by word layout, data structure, and design." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The most ubiquitous graph is the pie chart. It is a staple of the business world. [...] Never use a pie chart. Present a simple list of percentages, or whatever constitutes the divisions of the pie chart." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"[...] there are two problems with the indiscriminate multiplication of probabilities. First, multiplication without adjustment implies that the events represented by the probabilities are treated as independent. Second, since probabilities are always less than 1, the product will become smaller and smaller. If small probabilities are associated with unlikely events then, by a suitable selection, the joint occurrence of events can be made arbitrarily small." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"This pie chart violates several of the rules suggested by the question posed in the introduction. First, immediacy: the reader has to turn to the legend to find out what the areas represent; and the lack of color makes it very difficult to determine which area belongs to what code. Second, the underlying structure of the data is completely ignored. Third, a tremendous amount of ink is used to display eight simple numbers." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Three key aspects of presenting high dimensional data are: rendering, manipulation, and linking. Rendering determines what is to be plotted, manipulation determines the structure of the relationships, and linking determines what information will be shared between plots or sections of the graph." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"When there is more than one source of variation it is important to identify those sources." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.