"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"A graph is a set of points. A mathematical graph cannot be seen. It is an abstraction. A graphic, however, is a physical representation of a graph. This representation is accomplished by realizing graphs with aesthetic attributes such as size or color." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Coordinates are sets that locate points in space. These sets are usually numbers grouped in tuples, one tuple for each point. Because spaces can be defined as sets of geometric objects plus axioms defining their behavior, coordinates can be thought of more generally as schemes for mapping elements of sets to geometric objects." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Decision-makers process priors incorrectly in several ways. First, people tend to assess probability from the representativeness of an outcome rather than from its frequency. When supporting information is added to make an outcome more coherent and congruent with a representative mental image, people tend to judge the outcome more probable, even though the added qualifications and constraints by definition make it less probable. […] Second, humans often judge relative probability of outcomes by assessing similarity rather than frequency. […] Third, when given worthless evidence in a Bayesian framework, people tend to ignore prior probabilities and use the worthless evidence." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Estimating the missing values in a dataset solves one
problem - imputing reasonable values that have well-defined statistical
properties. It fails to solve another, however - drawing inferences about
parameters in a model fit to the estimated data. Treating imputed values as if they
were known (like the rest of the observed data) causes confidence intervals to
be too narrow and tends to bias other estimates that depend on the variability
of the imputed values (such as correlations).
"Human decision-making in the face of uncertainty is not only
prone to error, it is also biased against Bayesian principles. We are not
randomly suboptimal in our decisions. We are systematically suboptimal.
"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Taxonomies are useful to scientists when they lead to new theory or stimulate insights into a problem that previous theorizing might conceal. Classification for its own sake, however, is as unproductive in design as it is in science. In design, objects are only as useful as the system they support. And the test of a design is its ability to handle scenarios that include surprises, exceptions, and strategic reversals." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"The consequence of distinguishing statistical methods from
the graphics displaying them is to separate form from function. That is, the
same statistic can be represented by different types of graphics, and the same
type of graphic can be used to display two different statistics. […] This
separability of statistical and geometric objects is what gives a system a wide
range of representational opportunities."
"The ordinary histogram is constructed by binning data on a
uniform grid. Although this is probably the most widely used statistical
graphic, it is one of the more difficult ones to compute. Several problems
arise, including choosing the number of bins (bars) and deciding where to place
the cutpoints between bars."
"The plot tells us the data are granular in the data source,
something we could not ascertain with the histogram. There is an important
lesson here. Statistics texts and statistical packages that recommend the
histogram as the graphical starting point for a data analysis are giving bad
advice. The same goes for kernel density estimates. These are appropriate
second stages for graphical data analysis. The best starting point for getting
a sense of the distribution of a variable is a tally, stem-and-leaf, or a dot
plot. A dot plot is a special case of a tally (perhaps best thought of as a
delta-neighborhood tally). Once we see that the data are not granular, we may
move on to a histogram or kernel density, which smooths the data more than a
dot plot."
"To analyze means to untangle. Even when we 'let the data
speak for themselves', we need to untangle some aspect of the data before
displaying things in a graphic. The more analytics we can include in the
process of displaying graphics, the more flexibility our tools will have."
No comments:
Post a Comment