12 December 2011

📉Graphical Representation: Patterns (Just the Quotes)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"We can gain further insight into what makes good plots by thinking about the process of visual perception. The eye can assimilate large amounts of visual information, perceive unanticipated structure, and recognize complex patterns; however, certain kinds of patterns are more readily perceived than others. If we thoroughly understood the interaction between the brain, eye, and picture, we could organize displays to take advantage of the things that the eye and brain do best, so that the potentially most important patterns are associated with the most easily perceived visual aspects in the display." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"A good description of the data summarizes the systematic variation and leaves residuals that look structureless. That is, the residuals exhibit no patterns and have no exceptionally large values, or outliers. Any structure present in the residuals indicates an inadequate fit. Looking at the residuals laid out in an overlay helps to spot patterns and outliers and to associate them with their source in the data." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Sparklines are wordlike graphics, With an intensity of visual distinctions comparable to words and letters. [...] Words visually present both an overall shape and letter-by-letter detail; since most readers have seen the word previously, the visual task is usually one of quick recognition. Sparklines present an overall shape and aggregate pattern along with plenty of local detail. Sparklines are read the same way as words, although much more carefully and slowly." (Edward R Tufte, "Beautiful Evidence", 2006)

"Graphical displays are often constructed to place principal focus on the individual observations in a dataset, and this is particularly helpful in identifying both the typical positions of datapoints and unusual or influential cases. However, in many investigations, principal interest lies in identifying the nature of underlying trends and relationships between variables, and so it is oten helpful to enhance graphical displays in wayswhich give deeper insight into these features.his can be very beneficial both for small datasets, where variation can obscure underlying patterns, and large datasets, where the volume of data is so large that effective representation inevitably involves suitable summaries." (Adrian W Bowman, "Smoothing Techniques for Visualisation" [in "Handbook of Data Visualization"], 2008)

"Plotting data is a useful first stage to any analysis and will show extreme observations together with any discernible patterns. In addition the relative sizes of categories are easier to see in a diagram (bar chart or pie chart) than in a table. Graphs are useful as they can be assimilated quickly, and are particularly helpful when presenting information to an audience. Tables can be useful for displaying information about many variables at once, while graphs can be useful for showing multiple observations on groups or individuals. Although there are no hard and fast rules about when to use a graph and when to use a table, in the context of a report or a paper it is often best to use tables so that the reader can scrutinise the numbers directly." (Jenny Freeman et al, "How to Display Data", 2008)

"Where there is no natural ordering to the categories it can be helpful to order them by size, as this can help you to pick out any patterns or compare the relative frequencies across groups. As it can be difficult to discern immediately the numbers represented in each of the categories it is good practice to include the number of observations on which the chart is based, together with the percentages in each category." (Jenny Freeman et al, "How to Display Data", 2008)

"Bear in mind is that the use of color doesn’t always help. Use it sparingly and with a specific purpose in mind. Remember that the reader’s brain is looking for patterns, and will expect both recurrence itself and the absence of expected recurrence to carry meaning. If you’re using color to differentiate categorical data, then you need to let the reader know what the categories are. If the dimension of data you’re encoding isn’t significant enough to your message to be labeled or explained in some way - or if there is no dimension to the data underlying your use of difference colors - then you should limit your use so as not to confuse the reader." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"With further similarities to small multiples, heatmaps enable us to perform rapid pattern matching to detect the order and hierarchy of different quantitative values across a matrix of categorical combinations. The use of a color scheme with decreasing saturation or increasing lightness helps create the sense of data magnitude ranking." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"After you visualize your data, there are certain things to look for […]: increasing, decreasing, outliers, or some mix, and of course, be sure you’re not mixing up noise for patterns. Also note how much of a change there is and how prominent the patterns are. How does the difference compare to the randomness in the data? Observations can stand out because of human or mechanical error, because of the uncertainty of estimated values, or because there was a person or thing that stood out from the rest. You should know which it is." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Visualization is what happens when you make the jump from raw data to bar graphs, line charts, and dot plots. […] In its most basic form, visualization is simply mapping data to geometry and color. It works because your brain is wired to find patterns, and you can switch back and forth between the visual and the numbers it represents. This is the important bit. You must make sure that the essence of the data isn’t lost in that back and forth between visual and the value it represents because if you can’t map back to the data, the visualization is just a bunch of shapes." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"Upon discovering a visual image, the brain analyzes it in terms of primitive shapes and colors. Next, unity contours and connections are formed. As well, distinct variations are segmented. Finally, the mind attracts active attention to the significant things it found. That process is permanently running to react to similarities and dissimilarities in shapes, positions, rhythms, colors, and behavior. It can reveal patterns and pattern-violations among the hundreds of data values. That natural ability is the most important thing used in diagramming." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"The law of continuity states that we interpret images so as not to generate abrupt transitions or otherwise create images that are more complex. […] we can arbitrarily fill in the missing elements to complete a pattern. It’s also the case of time series, in which we assume that data points in the future will be a smooth continuation of the past. […] In a line chart, those series with a similar slope (that is, they appear to follow the same direction) are understood as belonging to the same group." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"With time series though, there is absolutely no substitute for plotting. The pertinent pattern might end up being a sharp spike followed by a gentle taper down. Or, maybe there are weird plateaus. There could be noisy spikes that have to be filtered out. A good way to look at it is this: means and standard deviations are based on the naïve assumption that data follows pretty bell curves, but there is no corresponding 'default' assumption for time series data (at least, not one that works well with any frequency), so you always have to look at the data to get a sense of what’s normal. [...] Along the lines of figuring out what patterns to expect, when you are exploring time series data, it is immensely useful to be able to zoom in and out." (Field Cady, "The Data Science Handbook", 2017)

"Heat maps are effective visualizations for seeing concentrations as well as patterns. Adding time series to a heat map can also reveal seasonality that may not be obvious otherwise." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Clutter is the main issue to keep in mind when assessing whether a paired bar chart is the right approach. With too many bars, and especially when there are more than two bars for each category, it can be difficult for the reader to see the patterns and determine whether the most important comparison is between or within the different categories." (Jonathan Schwabish, "Better Data Visualizations: A guide for scholars, researchers, and wonks", 2021)

"[...] scatterplots had advantages over earlier graphic forms: the ability to see clusters, patterns, trends, and relations in a cloud of points. Perhaps most importantly, it allowed the addition of visual annotations (point symbols, lines, curves, enclosing contours, etc.) to make those relationships more coherent and tell more nuanced stories." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Before even thinking about charts, it should be recognised that the table on its own is extremely useful. Its clear structure, with destination regions organised in columns and origins in rows, allows the reader to quickly look up any value - including totals - quickly and precisely. That’s what tables are good for. The deficiency of the table, however, is in identifying patterns within the data. Trying to understand the relationships between the numbers is difficult because, to compare the numbers with each other, the reader needs to store a lot of information in working memory, creating what psychologists refer to as a high 'cognitive load'." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

"Scatterplots are valuable because, without having to inspect each individual point, we can see overall aggregate patterns in potentially thousands of data points. But does this density of information come at a price - just how easy are they to read? [...] The truth is such charts can shed light on complex stories in a way words alone - or simpler charts you might be more familiar with - cannot." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

"Statistics are not necessarily a good determinant of underlying causes, but they can help you spot patterns - just make sure they’re helpful ones." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.