"[...] lack of interaction in analysis of variance and [...] lack of correlation in bivariate distributions - such nullities would be quite surprising phenomena in the usual interactive complexities of social life." (Louis Guttman, "What is not what in statistics", The Statistician, 1977)
"Fitting is essential to visualizing hypervariate data. The structure of data in many dimensions can be exceedingly complex. The visualization of a fit to hypervariate data, by reducing the amount of noise, can often lead to more insight. The fit is a hypervariate surface, a function of three or more variables. As with bivariate and trivariate data, our fitting tools are loess and parametric fitting by least-squares. And each tool can employ bisquare iterations to produce robust estimates when outliers or other forms of leptokurtosis are present." (William S Cleveland, "Visualizing Data", 1993)
"The scatterplot is a useful exploratory method for providing a first look at bivariate data to see how they are distributed throughout the plane, for example, to see clusters of points, outliers, and so forth." (William S Cleveland, "Visualizing Data", 1993)
"The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot, shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary." (Michael Friendly, "Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data", 199)
"A time-series plot (sometimes also called a time plot) is a simple graph of data collected over time that can be invaluable in identifying trends or patterns that might be of interest.A time-series plot can be constructed by thinking of the data set as a bivariate data set, where y is the variable observed and x is the time at which the observation was made. These (x, y) pairs are plotted as in a scatterplot. Consecutive observations are then connected by a line segment; this aids in spotting trends over time." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)
"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)
"A correlation is simply a bivariate relationship - a fancy way of saying that there is a relationship between two ('bi') variables ('variate'). And a bivariate relationship doesn’t prove that one thing caused the other. Think of it this way: you can observe that two things appear to be related statistically, but that doesn’t tell you the answer to any of the questions you might really care about - why is there a relationship and what does it mean to us as a consumer of data?" (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"One problem for visualizing multiple views is that of laying out the plots. Indeed, there are some plots, such as scatterplot matrixes and trellis displays, that are formed just by arranging simpler plots according to certain rules. Scatterplot matrices, for example, arrange scatterplots side by side so that each variable in a dataset is graphed against the other variables, with the graphs being displayed as a row or a column of the matrix. This lets the user rapidly inspect all of the bivariate relationships among the variables, permitting the detection of outliers, nonlinearities, and other features of the data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)
"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)
See also: univariate vs. multivariate data
No comments:
Post a Comment