15 November 2024

📉Graphical Representation: Distribution (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"The impression created by a chart depends to a great extent on the shape of the grid and the distribution of time and amount scales. When your individual figures are a part of a series make sure your own will harmonize with the other illustrations in spacing of grid rulings, lettering, intensity of lines, and planned to take the same reduction by following the general style of the presentation." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"The logarithmic transformation serves several purposes:" (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Plotting on power-transformed scales (either cube roots or logs) is recommended only in those cases where the distribution is very asymmetric and the reference configuration for the untransformed plot would be a straight line through the origin." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Comparing normal distributions reduces to comparing only means and standard deviations. If standard deviations are the same, the task even simpler: just compare means. On the other hand, means and standard deviations may be incomplete or misleading as summaries for nonnormal distributions." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Remember that normality and symmetry are not the same thing. All normal distributions are symmetrical, but not all symmetrical distributions are normal. With water use we were able to transform the distribution to be approximately symmetrical and normal, but often symmetry is the most we can hope for. For practical purposes, symmetry (with no severe outliers) may be sufficient. Transformations are not a magic wand, however. Many distributions cannot even be made symmetrical." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"The quantile plot is a good general display since it is fairly easy to construct and does a good job of portraying many aspects of a distribution. Three convenient features of the plot are the following: First, in constructing it, we do not make any arbitrary choices of parameter values or cell boundaries [...] and no models for the data are fitted or assumed. Second, like a table, it is not a summary but a display of all the data. Third, on the quantile plot every point is plotted at a distinct location, even if there are duplicates in the data. The number of points that can be portrayed without overlap is limited only by the resolution of the plotting device. For a high resolution device several hundred points distinguished." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"A useful feature of a stem plot is that the values maintain their natural order, while at the same time they are laid out in a way that emphasizes the overall distribution of where the values are concentrated (that is, where the longer branches are). This enables you easily to pick out key values such as the median and quartiles." (Alan Graham, "Developing Thinking in Statistics", 2006)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"The simplest and most common way to represent the empirical distribution of a numerical variable is by showing the individual values as dots arranged along a line. The main difficulty with this plot concerns how to treat tied values. We usually don't want to represent them by the same point, since that means that the two values look like one. What we can do is 'jitter' the points a bit (i.e., move them back and forth at right angles to the plot axis) so that all points are visible. […] In addition to permitting you to identify individual points, dotplots allow you to look into some of the distributional properties of a variable. […] Dotplots can also be good for looking for modality. " (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"There is no ‘correct’ way to display sets of numbers: each of the plots we have used has some advantages: strip-charts show individual points, box-and-whisker plots are convenient for rapid visual summaries, and histograms give a good feel for the underlying shape of the data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.