19 December 2011

📉Graphical Representation: Scatter Charts (Just the Quotes)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Scatter charts show the relationships between information, plotted as points on a grid. These groupings can portray general features of the source data, and are useful for showing where correlationships occur frequently. Some scatter charts connect points of equal value to produce areas within the grid which consist of similar features." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"The scatterplot is a useful exploratory method for providing a first look at bivariate data to see how they are distributed throughout the plane, for example, to see clusters of points, outliers, and so forth." (William S Cleveland, "Visualizing Data", 1993)

"One big advantage of parallel coordinate plots over scatterplot matrices. (i.e., the matrix of scatterplots of all variable pairs) is that parallel coordinate plots need less space to plot the same amount of data. On the other hand, parallel coordinate plots with p variables show only p - 1 adjacencies. However, adjacent variables reveal most of the information in a parallel coordinate plot. Reordering variables in a parallel coordinate plot is therefore essential." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009) 

"Parallel coordinate plots are often overrated concerning their ability to depict multivariate features. Scatterplots are clearly superior in investigating the relationship between two continuous variables and multivariate outliers do not necessarily stick out in a parallel coordinate plot. Nonetheless, parallel coordinate plots can help to find and understand features such as groups/clusters, outliers and multivariate structures in their multivariate context. The key feature is the ability to select and highlight individual cases or groups in the data, and compare them to other groups or the rest of the data." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Raster maps - often also called raster images - represent measurements on a regular grid. They are usually a result of remote sensing techniques via satellites or airborne surveillance systems. They fit neither the construct of scatterplots nor that of maps. Nevertheless, both scatterplots and maps can be used to display raster maps within statistics software which has no extra GIS capabilities." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"A scatterplot would show the relationship between [...] two variables in more detail, but would not convey the spatial patterns shown in […] micromap panels. Using conditioning to define a comparative grid of panels, […] changes an investigation from a sequential filtering of one variable at a time to more of a multivariable approach. In this context we can assess functional relationships, densities, or geospatial patterns within panels as well as changes across panels." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Need to consider outliers as they can affect statistics such as means, standard deviations, and correlations. They can either be explained, deleted, or accommodated (using either robust statistics or obtaining additional data to fill-in). Can be detected by methods such as box plots, scatterplots, histograms or frequency distributions." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Scatterplots are the preferred medium for adding smooth curves to show a causal functional relationship or an association […] However, despite the advantage of the scatterplot for seeing some types of patterns, the linked micromap design adds geographic location to the information displayed and so enables searches for geographic patterns that the scatterplot omits." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"[...] if you want to show change through time, use a time-series chart; if you need to compare, use a bar chart; or to display correlation, use a scatter-plot - because some of these rules make good common sense." (Alberto Cairo, "The Functional Art", 2011)

"The correlation coefficient has two fabulously attractive characteristics. First, for math reasons that have been relegated to the appendix, it is a single number ranging from –1 to 1. A correlation of 1, often described as perfect correlation, means that every change in one variable is associated with an equivalent change in the other variable in the same direction. A correlation of –1, or perfect negative correlation, means that every change in one variable is associated with an equivalent change in the other variable in the opposite direction. The closer the correlation is to 1 or –1, the stronger the association. […] The second attractive feature of the correlation coefficient is that it has no units attached to it. […] The correlation coefficient does a seemingly miraculous thing: It collapses a complex mess of data measured in different units (like our scatter plots of height and weight) into a single, elegant descriptive statistic." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Scatterplots are still the go-to visualization when one is examining relationships between continuous variables. One of the problems with the traditional scatterplot is that all data points are presented as if they are on equal footing. [...] Bubble maps are scatterplots with added dimensions. The most common usage is to add weight to individual data points based on population." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013)

"The idiom of scatterplots encodes two quantitative value variables using both the vertical and horizontal spatial position channels, and the mark type is necessarily a point. Scatterplots are effective for the abstract tasks of providing overviews and characterizing distributions, and specifically for finding outliers and extreme values. Scatterplots are also highly effective for the abstract task of judging the correlation between two attributes. With this visual encoding, that task corresponds the easy perceptual judgement of noticing whether the points form a line along the diagonal. The stronger the correlation, the closer the points fall along a perfect diagonal line; positive correlation is an upward slope, and negative is downward." (Tamara Munzner, "Visualization Analysis and Design", 2014)

"A scatterplot reveals the strength and shape of the relationship between a pair of variables. A scatterplot represents the two variables by axes drawn at right angles to each other, showing the observations as a cloud of points, each point located according to its values on the two variables. Various lines can be added to the plot to help guide our search for understanding." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Because we should, whenever possible, try to understand relationships between variables and not only describe each one of them in isolation, scatter plots are the most powerful charts available to us. The connected scatter plot is not easy to read at first, but I strongly encourage you to become familiar with it - at least during the exploratory stage - to check for relevant shapes in the relationships. Whenever you feel the need to use a dual-axis chart with two independent variables, you should try the connected scatter plot first." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The ability to see meaningful shapes in the data represents the highest level of data visualization, because it represents the highest level of data integration and a richer graphical landscape. Line charts and scatter plots are frequently used for this shape visualization." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"Your goal when designing a scattr plot is to make the relationship between two variables as clear as possible, including the overall level of association but also revealing clusters and outliers. This is easier said than done. The data and a few bad design choices can make reading a scatter plot too complex or misleading." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Indeed, among all forms of statistical graphics, the scatterplot may be considered the most versatile and generally useful invention in the entire history of statistical graphics. Essential characteristics of a scatterplot are that two quantitative variables are measured on the same observational units (workers); the values are plotted as points referred to perpendicular axes; and the goal is to show something about the relation between these variables, typically how the ordinate variable, y, varies with the abscissa variable, x." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"[...] scatterplots had advantages over earlier graphic forms: the ability to see clusters, patterns, trends, and relations in a cloud of points. Perhaps most importantly, it allowed the addition of visual annotations (point symbols, lines, curves, enclosing contours, etc.) to make those relationships more coherent and tell more nuanced stories." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Scatterplots are valuable because, without having to inspect each individual point, we can see overall aggregate patterns in potentially thousands of data points. But does this density of information come at a price - just how easy are they to read? [...] The truth is such charts can shed light on complex stories in a way words alone - or simpler charts you might be more familiar with - cannot." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.