25 December 2006

Leland Wilkinson - Collected Quotes

"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"A graph is a set of points. A mathematical graph cannot be seen. It is an abstraction. A graphic, however, is a physical representation of a graph. This representation is accomplished by realizing graphs with aesthetic attributes such as size or color." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Coordinates are sets that locate points in space. These sets are usually numbers grouped in tuples, one tuple for each point. Because spaces can be defined as sets of geometric objects plus axioms defining their behavior, coordinates can be thought of more generally as schemes for mapping elements of sets to geometric objects." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Decision-makers process priors incorrectly in several ways. First, people tend to assess probability from the representativeness of an outcome rather than from its frequency. When supporting information is added to make an outcome more coherent and congruent with a representative mental image, people tend to judge the outcome more probable, even though the added qualifications and constraints by definition make it less probable. […] Second, humans often judge relative probability of outcomes by assessing similarity rather than frequency. […] Third, when given worthless evidence in a Bayesian framework, people tend to ignore prior probabilities and use the worthless evidence." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Human decision-making in the face of uncertainty is not only prone to error, it is also biased against Bayesian principles. We are not randomly suboptimal in our decisions. We are systematically suboptimal. (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Taxonomies are useful to scientists when they lead to new theory or stimulate insights into a problem that previous theorizing might conceal. Classification for its own sake, however, is as unproductive in design as it is in science. In design, objects are only as useful as the system they support. And the test of a design is its ability to handle scenarios that include surprises, exceptions, and strategic reversals." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The consequence of distinguishing statistical methods from the graphics displaying them is to separate form from function. That is, the same statistic can be represented by different types of graphics, and the same type of graphic can be used to display two different statistics. […] This separability of statistical and geometric objects is what gives a system a wide range of representational opportunities." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The grammar of graphics takes us beyond a limited set of charts (words) to an almost unlimited world of graphical forms (statements). The rules of graphics grammar are sometimes mathematical and sometimes aesthetic. Mathematics provides symbolic tools for representing abstractions. Aesthetics, in the original Greek sense, offers principles for relating sensory attributes (color, shape, sound, etc.) to abstractions. In modern usage, aesthetics can also mean taste." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The ordinary histogram is constructed by binning data on a uniform grid. Although this is probably the most widely used statistical graphic, it is one of the more difficult ones to compute. Several problems arise, including choosing the number of bins (bars) and deciding where to place the cutpoints between bars." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The plot tells us the data are granular in the data source, something we could not ascertain with the histogram. There is an important lesson here. Statistics texts and statistical packages that recommend the histogram as the graphical starting point for a data analysis are giving bad advice. The same goes for kernel density estimates. These are appropriate second stages for graphical data analysis. The best starting point for getting a sense of the distribution of a variable is a tally, stem-and-leaf, or a dot plot. A dot plot is a special case of a tally (perhaps best thought of as a delta-neighborhood tally). Once we see that the data are not granular, we may move on to a histogram or kernel density, which smooths the data more than a dot plot." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The visual representation of a scale - an axis with ticks - looks like a ladder. Scales are the types of functions we use to map varsets to dimensions. At first glance, it would seem that constructing a scale is simply a matter of selecting a range for our numbers and intervals to mark ticks. There is more involved, however. Scales measure the contents of a frame. They determine how we perceive the size, shape, and location of graphics. Choosing a scale (even a default decimal interval scale) requires us to think about what we are measuring and the meaning of our measurements. Ultimately, that choice determines how we interpret a graphic." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"To analyze means to untangle. Even when we 'let the data speak for themselves', we need to untangle some aspect of the data before displaying things in a graphic. The more analytics we can include in the process of displaying graphics, the more flexibility our tools will have." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.