"The chief problems in the technique of historigram [aka histogram] plotting are those of base line scales, types of lines to use for the graphs and methods of and purposes of smoothing these curves. The size of page, ability of grasp by the eye, subsequent treatment of the illustration, etc., are determining factors. The variable factor is usually plotted from a base line along the ordinate axis. Spacing and rules for scales apply as in frequency diagrams." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)
"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland, "The Elements of Graphing Data", 1985)
"If the underlying pattern of the data has gentle curvature with no local maxima and minima, then locally linear fitting is usually sufficient. But if there are local maxima or minima, then locally quadratic fitting typically does a better job of following the pattern of the data and maintaining local smoothness." (William S Cleveland, "Visualizing Data", 1993)
"The plot tells us the data are granular in the data source, something we could not ascertain with the histogram. There is an important lesson here. Statistics texts and statistical packages that recommend the histogram as the graphical starting point for a data analysis are giving bad advice. The same goes for kernel density estimates. These are appropriate second stages for graphical data analysis. The best starting point for getting a sense of the distribution of a variable is a tally, stem-and-leaf, or a dot plot. A dot plot is a special case of a tally (perhaps best thought of as a delta-neighborhood tally). Once we see that the data are not granular, we may move on to a histogram or kernel density, which smooths the data more than a dot plot." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"Another method used to simplify the appearance of a graphic is smoothing. A regression line overlaid on a scatterplot is a smooth representation of the relationship between the two graph variables. For time series data, a moving average of the data over time is often used to smooth out the variation over small time steps in order to illustrate the overall trend." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)
"Scatterplots are the preferred medium for adding smooth curves to show a causal functional relationship or an association […] However, despite the advantage of the scatterplot for seeing some types of patterns, the linked micromap design adds geographic location to the information displayed and so enables searches for geographic patterns that the scatterplot omits." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)
"Smoothing is a technique that can be used to remove some of the variation in short-term data in favor of emphasizing long-term trends." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)
No comments:
Post a Comment