14 November 2011

📉Graphical Representation: Extremes (Just the Quotes)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"If the underlying pattern of the data has gentle curvature with no local maxima and minima, then locally linear fitting is usually sufficient. But if there are local maxima or minima, then locally quadratic fitting typically does a better job of following the pattern of the data and maintaining local smoothness." (William S Cleveland, "Visualizing Data", 1993)

"Variance and its square root, the standard deviation, summarize the amount of spread around the mean, or how much a variable varies. Outliers influence these statistics too, even more than they influence the mean. On the other hand. the variance and standard deviation have important mathematical advantages that make them (together with the mean) the foundation of classical statistics. If a distribution appears reasonably symmetrical, with no extreme outliers, then the mean and standard deviation or variance are the summaries most analysts would use." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Clearly, the mean is greatly influenced by extreme values, but it can be appropriate for many situations where extreme values do not arise. To avoid misuse, it is essential to know which summary measure best reflects the data and to use it carefully. Understanding the situation is necessary for making the right choice. Know the subject!" (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"A feature shared by both the range and the interquartile range is that they are each calculated on the basis of just two values - the range uses the maximum and the minimum values, while the IQR uses the two quartiles. The standard deviation, on the other hand, has the distinction of using, directly, every value in the set as part of its calculation. In terms of representativeness, this is a great strength. But the chief drawback of the standard deviation is that, conceptually, it is harder to grasp than other more intuitive measures of spread." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"Standard quantile graphs offer certain advantages over cumulative percent frequency graphs. Among these advantages are ease of construction, actual data points are shown as opposed to summaries of class intervals, no decisions are required as to what the best size class interval might be, the same curve functions as a less-than and greater-than curve, and the actual maximum and minimum values are shown on the graph." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it is a mistake - maybe due to a measuring or recording error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Plotting data is a useful first stage to any analysis and will show extreme observations together with any discernible patterns. In addition the relative sizes of categories are easier to see in a diagram" (bar chart or pie chart) than in a table. Graphs are useful as they can be assimilated quickly, and are particularly helpful when presenting information to an audience. Tables can be useful for displaying information about many variables at once, while graphs can be useful for showing multiple observations on groups or individuals. Although there are no hard and fast rules about when to use a graph and when to use a table, in the context of a report or a paper it is often best to use tables so that the reader can scrutinise the numbers directly." (Jenny Freeman et al, "How to Display Data", 2008)

13 November 2011

📉Graphical Representation: Density (Just the Quotes)

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison, and choice - as so often it is - then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, require viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"Using colour, itʼs possible to increase the density of information even further. A single colour can be used to represent two variables simultaneously. The difficulty, however, is that there is a limited amount of information that can be packed into colour without confusion." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"The use of the density scale to construct the histogram ensures that the area of each rectangle in the histogram will be proportional to the corresponding relative frequency. The formula for density can also be used when class widths are equal. However, when the intervals are of equal width, the extra arithmetic required to obtain the densities is unnecessary." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Linking is a powerful dynamic interactive graphics technique that can help us better understand high-dimensional data. This technique works in the following way: When several plots are linked, selecting an observation's point in a plot will do more than highlight the observation in the plot we are interacting with - it will also highlight points in other plots with which it is linked, giving us a more complete idea of its value across all the variables. Selecting is done interactively with a pointing device. The point selected, and corresponding points in the other linked plots, are highlighted simultaneously. Thus, we can select a cluster of points in one plot and see if it corresponds to a cluster in any other plot, enabling us to investigate the high-dimensional shape and density of the cluster of points, and permitting us to investigate the structure of the disease space." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"When there are few data points, place the data labels directly on the data. Data density refers to the amount of data shown in a visualization through encodings (points, bars, lines, etc.). A common mistake is presenting too much data in a single data graph. The data itself can obscure the insight. It can make the chart unreadable because the data values are not discernible. Examples include: overlapping data points, too many lines in a line chart, or too many slices in a pie chart. Selecting the appropriate amount of data requires a delicate balance. It is your job to determine how much detail is necessary." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

📉Graphical Representation: Missing Data (Just the Quotes)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We often think, naïvely, that missing data are the primary impediments to intellectual progress - just find the right facts and all problems will dissipate. But barriers are often deeper and more abstract in thought. We must have access to the right metaphor, not only to the requisite information. Revolutionary thinkers are not, primarily, gatherers of facts, but weavers of new intellectual structures." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"People tend to give greater weight to the data that they have just been exposed to than other relevant data. […] This phenomenon, where people give greater attention to recent or easily available data, is often referred to as an availability error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[…] events will always occur that cannot be foreseen by following a chain of logical deductive reasoning. Successful prediction requires intuitive leaps and/or information that is not part of the original data available." (John L Casti, "X-Events: The Collapse of Everything", 2012)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[…] people attempt to use highly flexible mathematical structures with large numbers of parameters that can be adjusted to fit the data, the result often being models that fit the data well but lack structural representation of the phenomena and thus are not predictive outside the range of the data. The situation is exacerbated by uncertainty regarding model parameters on account of insufficient data relative to model complexity, which in fact means uncertainty regarding the models themselves. More importantly from the standpoint of epistemology, the amount of available data is often miniscule in comparison to the amount needed for validation. The desire for knowledge has far outstripped experimental/observational capability. We are starved for data." (Edward R Dougherty, "The Evolution of Scientific Knowledge: From certainty to uncertainty", 2016)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious […]. Other omissions show up only when we take a close look at the claim in question." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

12 November 2011

📉Graphical Representation: Exploration (Just the Quotes)

"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers even a very large set - is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Working with binned data directly addresses large data set issues of computation and plotting speed. Almost everything that can bc done with the original data can be done faster with binned data. Further, working with binned data allows image processing algorithms to be adapted and applied to bin cells. Thus tools can bc brought to bare that are not traditionally associated with exploratory data analysis." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"The scatterplot is a useful exploratory method for providing a first look at bivariate data to see how they are distributed throughout the plane, for example, to see clusters of points, outliers, and so forth." (William S Cleveland, "Visualizing Data", 1993)

"Construction refers to everything involved in the production of the graphical display, including questions of what to plot and how to plot. Deciding what to plot is not always easy and again depends on what we want to accomplish. In the initial phases of an analysis, two-dimensional displays of the response against each of the p predictors are obvious choices for gaining insights about the data, choices that are often recommended in the introductory regression literature. Displays of residuals from an initial exploratory fit are frequently used as well." (R Dennis Cook, "Regression Graphics: Ideas for studying regressions through graphics", 1998)

"If we attempt to map the world of a story before we explore it, we are likely either to (a) prematurely limit our exploration, so as to reduce the amount of material we need to consider, or" (b) explore at length but, recognizing the impossibility of taking note of everything, and having no sound basis for choosing what to include, arbitrarily omit entire realms of information. The opportunities are overwhelming." (Peter Turchi, "Maps of the Imagination: The writer as cartographer", 2004)

"Clearly principles and guidelines for good presentation graphics have a role to play in exploratory graphics, but personal taste and individual working style also play important roles. The same data may be presented in many alternative ways, and taste and customs differ as to what is regarded as a good presentation graphic. Nevertheless, there are principles that should be respected and guidelines that are generally worth following. No one should expect a perfect consensus where graphics are concerned." (Antony Unwin, Good Graphics?"[in "Handbook of Data Visualization"], 2008)

"There are two main reasons for using graphic displays of datasets: either to present or to explore data. Presenting data involves deciding what information you want to convey and drawing a display appropriate for the content and for the intended audience. [...] Exploring data is a much more individual matter, using graphics to find information and to generate ideas.Many displays may be drawn. They can be changed at will or discarded and new versions prepared, so generally no one plot is especially important, and they all have a short life span." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"All graphics present data and allow a certain degree of exploration of those same data. Some graphics are almost all presentation, so they allow just a limited amount of exploration; hence we can say they are more infographics than visualization, whereas others are mostly about letting readers play with what is being shown, tilting more to the visualization side of our linear scale. But every infographic and every visualization has a presentation and an exploration component: they present, but they also facilitate the analysis of what they show, to different degrees." (Alberto Cairo, "The Functional Art", 2011)

"A viewer’s eye must be guided to 'read' the elements in a logical order. The design of an exploratory graphic needs to allow for the additional component of discovery - guiding the viewer to first understand the overall concept and then engage her to further explore the supporting information." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"The process of visual analysis can potentially go on endlessly, with seemingly infinite combinations of variables to explore, especially with the rich opportunities bigger data sets give us. However, by deploying a disciplined and sensible balance between deductive and inductive enquiry you should be able to efficiently and effectively navigate towards the source of the most compelling stories." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"Early exploration of a dataset can be overwhelming, because you don’t know where to start. Ask questions about the data and let your curiosities guide you. […] Make multiple charts, compare all your variables, and see if there are interesting bits that are worth a closer look. Look at your data as a whole and then zoom in on categories and individual data points. […] Subcategories, the categories within categories" (within categories), are often more revealing than the main categories. As you drill down, there can be higher variability and more interesting things to see." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Put everything together - from understanding data, to exploration, clarity, and adapting to an audience - and you get a general process for how to make data graphics. " (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Visualization can be appreciated purely from an aesthetic point of view, but it’s most interesting when it’s about data that’s worth looking at. That’s why you start with data, explore it, and then show results rather than start with a visual and try to squeeze a dataset into it. It’s like trying to use a hammer to bang in a bunch of screws. […] Aesthetics isn’t just a shiny veneer that you slap on at the last minute. It represents the thought you put into a visualization, which is tightly coupled with clarity and affects interpretation." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products" (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Exploratory analysis is what you do to understand the data and figure out what might be noteworthy or interesting to highlight to others." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Exploring data generates hypotheses about patterns in our data. The visualizations and tools of dynamic interactive graphics ease and improve the exploration, helping us to 'see what our data seem to say'." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

With time series though, there is absolutely no substitute for plotting. The pertinent pattern might end up being a sharp spike followed by a gentle taper down. Or, maybe there are weird plateaus. There could be noisy spikes that have to be filtered out. A good way to look at it is this: means and standard deviations are based on the naïve assumption that data follows pretty bell curves, but there is no corresponding 'default' assumption for time series data (at least, not one that works well with any frequency), so you always have to look at the data to get a sense of what’s normal. [...] Along the lines of figuring out what patterns to expect, when you are exploring time series data, it is immensely useful to be able to zoom in and out." (Field Cady, "The Data Science Handbook", 2017)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"Data that is well prepared makes the analysis easier and allows a deeper exploration of patterns. It helps the analyst sift through the data with less friction. Data that is well crafted holds up to rigorous analysis and presentation. It removes the wall between us and the data and allows us to see the patterns. Well-shaped data isn't only functional, it's also aesthetic." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"We define analytical intent to be the goal that a consumer or analyst focuses on when performing either targeted or more open-ended data exploration and discovery. Analytical intent is expressed as part of a conversation between the user and a visualization interface." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Charts used to confirm are less formal, and designed well enough to be interpreted, but they don’t always have to be presentation worthy. […] Or maybe you don’t know what you’re looking for […] This is exploratory work - rougher still in design, usually iterative, sometimes interactive. Most of us don’t do as much exploratory work as we do declarative and confirmatory; we should do more. It’s a kind of data brainstorming." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Confirmation is a kind of focused exploration, whereas true exploration is more open-ended. The bigger and more complex the data, and the less you know going in, the more exploratory the work. If confirmation is hiking a new trail, exploration is blazing one." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

📉Graphical Representation: Expert Perspectives (Just the Quotes)

"Absorb the data. Read it, re-read it, read it backwards and understand the lyrical and human-centred contribution." (Kate McLean) [1]

"Admit that nothing you create on deadline will be perfect. However, it should never be wrong. I try to work by a motto my editor likes to say: 'No Heroics. Your code may not be beautiful, but if it works, it’s good enough.' A visualisation may not have every feature you could possibly want, but if it gets the message across and is useful to people, it’s good enough. Being 'good enough' is not an insult in journalism – it’s a necessity." (Lena Groeger) [1]

"After the data exploration phase you may come to the conclusion that the data does not support the goal of the project. The thing is: data is leading in a data visualization project – you cannot make up some data just to comply with your initial ideas. So, you need to have some kind of an open mind and 'listen to what the data has to say' and learn what its potential is for a visualization. Sometimes this means that a project has to stop if there is too much of a mismatch between the goal of the project and the available data. In other cases this may mean that the goal needs to be adjusted and the project can continue." (Jan Willem Tulp) [1]

"Although all our projects are very much data driven, visualisation is only part of the products and solutions we create. This day and age provides us with amazing opportunities to combine video, animation, visualisation, sound and interactivity. Why not make full use of this? Judging whether to include something or not is all about editing: asking 'is it really necessary?'. There is always an aspect of gut feel or instinct mixed with continuous doubt that drives me in these cases." (Thomas Clever) [1]

"At the beginning, there’s a process of 'interviewing' the data – first evaluating their source and means of collection/aggregation/computation, and then trying to get a sense of what they say – and how well they say it via quick sketches in Excel with pivot tables and charts. Do the data, in various slices, say anything interesting? If I’m coming into this with certain assumptions, do the data confirm them, or refute them?" (Alyson Hurt) [1]

"Context is key. You’ll hear that the most important quality of a visualisation is graphical honesty, or storytelling value, or facilitation of 'insights'. The truth is, all of these things (and others) are the most important quality, but in different times and places. There is no singular function of visualisation; what’s important shifts with the constraints of your audience, goals, tools, expertise, and data and time available.’ (Scott Murray) [1]

"Data and data sets are not objective; they are creations of human design. Hidden biases in both the collection and analysis stages present considerable risks [in terms of inference]." (Kate Crawford) [1]

"Data inspires me. I always open the data in its native format and look at the raw data just to get the lay of the land. It’s much like looking at a map to begin a journey." (Kim Rees) [1]

"'Everything must have a reason.' A principle that I learned as a graphic designer that still applies to data visualization. In essence, everything needs to be rationalized and have a logic to why it’s in the design/visualization, or it’s out." (Stefanie Posavec) [1]

"Good design is honest. It does not make a product appear more innovative, powerful or valuable than it really is. It does not attempt to manipulate the consumer with promises that cannot be kept." (Dieter Rams) [1]

"I focus on structural exploration on one side and on the reality and the landscape of opportunities in the other […] I try not to impose any early ideas of what the result will look like because that will emerge from the process. In a nutshell I first activate data curiosity, client curiosity, and then visual imagination in parallel with experimentation." (Santiago Ortiz) [1]

"I kick it over into a rough picture as soon as possible. When I can see something then I am able to ask better questions of it – then the what-about-this iterations begin. I try to look at the same data in as many different dimensions as possible. For example, if I have a spreadsheet of bird sighting locations and times, first I like to see where they happen, previewing it in some mapping software. I’ll also look for patterns in the timing of the phenomenon, usually using a pivot table in a spreadsheet. The real magic happens when a pattern reveals itself only when seen in both dimensions at the same time." (John Nelson) [1]

"I say begin by learning about data visualization’s 'black and whites' , the rules, then start looking for the greys. It really then becomes quite a personal journey of developing your conviction." (Jorge Camoes) [1]

"I suppose one could say our work has a certain signature. Style, to me, has a negative connotation of 'slapped on' = to prettify something without much meaning. We don’t make it our goal to have a recognisable (visual) signature, instead to create work that truly matters and is unique. Pretty much all our projects are bespoke and have a different end result. That is one of the reasons why we are more concerned with working according to values and principles that transcend individual projects and I believe that is what makes our work recognisable." (Thomas Clever) [1]

"I think this is something I’ve learned from experience rather than advice that was passed on. Less can often be more. In other words, don’t get carried away and try to tell the reader everything there is to know on a subject. Know what it is that you want to show the reader and don’t stray from that. I often find myself asking others 'do we need to show this?” or “is this really necessary'?' Let’s take it out." (Simon Scarr) [1]

"I truly feel that experimentation (even for the sake of experimentation) is important, and I would strongly encourage it. There are infinite possibilities in diagramming and visual communication, so we have much to explore yet. I think a good rule of thumb is to never allow your design or implementation to obscure the reader understanding the central point of your piece. However, I’d even be willing to forsake this, at times, to allow for innovation and experimentation. It ends up moving us all forward, in some way or another." (Kennedy Elliott) [1]

"I’m obsessed with alignments. Sloppy label placement on final files causes my confidence in the designer to flag. What other details haven’t been given full attention? Has the data been handled sloppily as well? [...] On the flip side, clean, layered, and logically built final files are a thing of beauty and my confidence in the designer, and their attention to detail, soars." (Jen Christiansen) [1]

"I’ve come to believe that pure beautiful visual works are somehow relevant in everyday life, because they can become a trigger to get people curious to explore the contents these visuals convey. I like the idea of making people say 'oh that’s beautiful! I want to know what this is about!' I think that probably (or, at least, lots of people pointed that out to us) being Italians plays its role on this idea of 'making things not only functional but beautiful'." (Giorgia Lupi) [1]

"It is easy to immerse yourself in a certain idea, but I think it is important to step back regularly and recognize that other people have different ways of interpreting things. I am very fortunate to work with people whom I greatly admire and who also see things from a different perspective. Their feedback is invaluable in the process." (Jane Pong) [1]

"Look at how other designers solve visual problems (but don’t copy the look of their solutions). Look at art to see how great painters use space, and organise the elements of their pictures. Look back at the history of infographics. It’s all been done before, and usually by hand! Draw something with a pencil (or pen [...] but NOT a computer!). Sketch often: The cat asleep. The view from the bus. The bus. Personally, I listen to music – mostly jazz – a lot." (Nigel Holmes) [1]

‘My design approach requires that I immerse myself deeply in the problem domain and available data very early in the project, to get a feel for the unique characteristics of the data, its 'texture' and the affordances it brings. It is very important that the results from these explorations, which I also discuss in detail with my clients, can influence the basic concept and main direction of the project. To put it in Hans Rosling’s words, you need to “let the data set change your mind set”. (Moritz Stefaner) [1]

"My main advice is not to be disheartened. Sometimes the data don’t show what you 
thought they would, or they aren’t available in a usable or comparable form. But [in my world] sometimes that research still turns up threads a reporter could pursue and turn into a really interesting story – there just might not be a viz in it. Or maybe there’s no story at all. And that’s all okay. At minimum, you’ve still hopefully learned something new in the process about a topic, or a data source (person or database), or a 'gotcha' in a particular dataset – lessons that can be applied to another project down the line." (Alyson Hurt) [1]

"Research is key. Data, without interpretation, is just a jumble of words and numbers – out of context and devoid of meaning. If done well, research not only provides a solid foundation upon which to build your graphic/visualisation, but also acts as a source of inspiration and a guidebook for creativity. A good researcher must be a team player with the ability to think critically, analytically, and creatively. They should be a proactive problem solver, identifying potential pitfalls and providing various roadmaps for overcoming them. In short, their inclusion should amplify, not restrain, the talents of others." (Amanda Hobbs) [1]

"The capability to cope with the technological dimension is a key attribute of successful students: coding – more as a logic and a mindset than a technical task – is becoming a very important asset for designers who want to work in Data Visualization. It doesn’t necessarily mean that you need to be able to code to find a job, but it helps a lot in the design process. The profile in the (near) future will be a hybrid one, mixing competences, skills and approaches currently separated into disciplinary silos." (Paolo Ciuccarelli) [1]

"The experience offered by a visualisation influences the interpreting phase of understanding. Whereas tone embodies a continuum, the judgement of the most suitable experience is more distinct and concerns different methods of enabling interpretation: explanatory, exhibitory or exploratory you degrade its existence and malign its importance. Words are not your enemy. Complex thoughts are not your enemy. Confusion is. Don’t confuse your audience. Don’t talk down to them, don’t mislead them, and certainly don’t lie to them." (Amanda Hobbs) [1]

"The key difference I think in producing data visualization/infographics in the service of journalism versus other contexts (like art) is that there is always an underlying, ultimate goal: to be useful. Not just beautiful or efficient – although something can (and should!) be all of those things. But journalism presents a certain set of constraints. A journalist has to always ask the question: How can I make this more useful? How can what I am creating help someone, teach someone, show someone something new?" (Lena Groeger) [1]

"There's a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy." (Amanda Cox, [interview in ( Scott Berinato"The Power of Visualization’s 'Aha!' Moments, Harvard Business Review] 2013) (link) [1]

"Think of the reader – a specific reader, like a friend who’s curious but a novice to the subject and to data-viz – when designing the graphic. That helps. And I rely pretty heavily on that introductory text that runs with each graphic – about 100 words, usually, that should give the new-to-the-subject reader enough background to understand why this graphic is worth engaging with and sets them up to understand and contextualize the takeaway. And annotate the graphic itself. If there’s a particular point you want the reader to understand, make it! Explicitly!" (Katie Peek) [1]

"Using our eyes to switch between different views that are visible simultaneously has much 
lower cognitive load than consulting our mem￾ory to compare a current view with what was seen before." (Tamara Munzner) [1]

"We should pay as much attention to understanding the project’s goal in relation to its audience. This involves understanding principles of perception and cognition in addition to other relevant factors, such as culture and education levels, for example. More importantly, it means carefully matching the tasks in the representation to our audience’s needs, expectations, expertise, etc. Visualizations are human-centred projects, in that they are not universal and will not be effective for all humans uniformly. As producers of visualizations, whether devised for data exploration or communication of information, we need to take into careful consideration those on the other side of the equation, and who will face the challenges of decoding our representations." (Isabel Meirelles) [1]

"What is the least this can be? What is the minimum result that will 1) be factually accurate, 2) present the core concepts of this story in a way that a general audience will understand, and 3) be readable on a variety of screen sizes 
(desktop, mobile, etc.)? And then I judge what else can be done based on the time I have. 
Certainly, when we’re down to the wire it’s no time to introduce complex new features that require lots of testing and could potentially break other, working features." (Alyson Hurt) [1]

"When I first started learning about visualisation, I naively assumed that datasets arrived at your doorstep ready to roll. Begrudgingly I accepted that before you can plot or graph anything, you have to find the data, understand it, evaluate it, clean it, and perhaps restructure it." (Marcia Gray) [1]

"When something is not harmonious, it’s either boring or chaotic. At one extreme is a visual experience that is so bland that the viewer is not engaged. The human brain will reject understimulating information. At the other extreme is a visual experience that is so overdone, so chaotic, that the viewer can’t stand to look at it. The human brain rejects what it cannot organize, what it cannot understand." (Jill Morton) [1]

"When the data has been explored sufficiently, it is time to sit down and reflect – what were the most interesting insights? What surprised me? What were the recurring themes and facts throughout all views on the data? In the end, what do we find most important and most interesting? These are the things that will govern which angles and perspectives we want to emphasize in the subsequent project phases." (Moritz Stefaner) [1]

"You don’t get there [beauty] with cosmetics, you get there by taking care of the details, by polishing and refining what you have. This is ultimately a matter of trained taste, or what German speakers call fingerspitzengefühl ('finger-tip-feeling')." (Oliver Reichenstein) [1]

References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019

SQL Server New Features: SQL Server 2012 is almost here

    I was quite quiet for the past 3-4 months, and this not because of the lack of blogging material, but lack of time. Instead of writing I preferred reading, diving in some special topics related to SQL Server (e.g. tempdb and security), in the near future following to post some of my notes. For short time I was busy learning for ITIL® v3 Foundation Certification, the topics on Knowledge Management giving me more ideas for several posts waiting in the pipe. I started also the online “Introduction to Databases” course offered by Stanford University, attempting thus a scholastic approach of the topic, of importance being the material on Relational Algebra, material I didn’t had the chance to study in the past.
   From my perspective, during this time two  important events related to SQL Server took place – the launch of AX Dynamics 2012 and, more recently, the introduction of SQL Server 2012 at PASS (The Professional Association of SQL Server) 2011.

SQL Server 2012
    At PASS Summit 2011 were disclosed 4 of the newest SQL Server Products: SQL Server 2012 (code Denali), Power View (code Crescent), ColumnStore Index (code Apollo) and SQL Server Data Tools (code Juneau). The PASS 2011 streamed sessions are available online with quite interesting materials on SQL Server topics like application and database development, database administration and deployment, BI, etc. If you want to learn more about SQL Server, check the CTP 3 Product Guide, which contains datasheets, white papers, technical presentations, demonstrations and links to videos, or the SQL Server 2012 Developer Training Kit Preview (requires Microsoft’s Web Platform Installer).

Dynamics AX 2012
    Because lately I’ve been spending more and more time with Dynamics AX, Microsoft’s ERP (Enterprise Resource Planning) solution, I’d like to include related content in my posts, at least presenting resources if I can’t get yet into technical stuff. As its backend is based mainly on SQL Server, AX is the perfect environment to see SQL Server at work, or to perform configuration and administration activities. In addition, AX material (best/good practices, methodologies, various other papers) related to SQL Server could be extended to other environments. I’m saluting Microsoft’s decision of making available publicly more Technet and MSDN content, previously most of the technical content being accessible mainly though Microsoft’s Partner Network and Customer Network. A good compilation of resources is available on AX Technical Support Blog and Inside Microsoft Dynamics AX blog.
    As pointed above, recently was launched Microsoft Dynamics AX 2012 (see global and local launch events).  It’s interesting to point out that, with this edition, SSRS becomes the reporting platform for AX, a considerable step forward.

Books
     In what concerns the free books there are 3 free “new” appearances: Jonathan Kehayias and Ted Krueger’s book Troubleshooting SQL Server: A Guide for the Accidental DBA (zipped PDF), which provides a basic approach to troubleshooting, Fabiano Amorim’s book on Complete Showplan Operators (PDF, Epub), and Ross Mistry and Stacia Misner’s Introducing Microsoft SQL Server 2008 R2 (PDF, requires registration).

11 November 2011

📉Graphical Representation: Structure (Just the Quotes)

"Graphic charts have often been thought to be tools of those alone who are highly skilled in mathematics, but one needs to have a knowledge of only eighth-grade arithmetic to use intelligently even the logarithmic or ratio chart, which is considered so difficult by those unfamiliar with it. […] If graphic methods are to be most effective, those who are unfamiliar with charts must give some attention to their fundamental structure. Even simple charts may be misinterpreted unless they are thoroughly understood. For instance, one is not likely to read an arithmetic chart correctly unless he also appreciates the significance of a logarithmic chart." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"Structured information is any type of information that is arranged to show relationships between the minute, individual particles" (bits) of information and the final presentation of this information in a logical arrangement with continuity from beginning to end." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Frequently we can increase the informativeness of a graph by removing structure from the data once we have identified it, so that subsequent plots are free of its dominating influence and can help us see finer structure or subtler effects. This usually means" (l) partitioning the data, or" (2) plotting differences or ratios, or" (3) fitting a model and taking the residuals as a new set of data for further study." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The truth is that one display is better than another if it leads to more understanding. Often a simpler display, one that tries to accomplish less at one time, succeeds in conveying more insight. In order to understand complicated or subtle structure in the data we should be prepared to look at complicated displays when necessary, but to see any particular type of structure we should use the simplest display that shows it." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"In order to be easily understood, a display of information must have a logical structure which is appropriate for the user's knowledge and needs, and this structure must be clearly represented visually. In order to indicate structure, it is necessary to be able to emphasize, divide and relate items of information. Visual emphasis can be used to indicate a hierarchical relationship between items of information, as in the case of systems of headings and subheadings for example. Visual separation of items can be used to indicate that they are different in kind or are unrelated functionally, and similarly a visual relationship between items will imply that they are of a similar kind or bear some functional relation to one another. This kind of visual 'coding' helps the reader to appreciate the extent and nature of the relationship between items of information, and to adopt an appropriate scanning strategy." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"A good graph displays relationships and structures that are difficult to detect by merely looking at the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The content and context of the numerical data determines the most appropriate mode of presentation. A few numbers can be listed, many numbers require a table. Relationships among numbers can be displayed by statistics. However, statistics, of necessity, are summary quantities so they cannot fully display the relationships, so a graph can be used to demonstrate them visually. The attractiveness of the form of the presentation is determined by word layout, data structure, and design." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Merely drawing a plot does not constitute visualization. Visualization is about conveying important information to the reader accurately. It should reveal information that is in the data and should not impose structure on the data." (Robert Gentleman, "Bioinformatics and Computational Biology Solutions using R and Bioconductor", 2005)

"A diagram is a graphic shorthand. Though it is an ideogram, it is not necessarily an abstraction. It is a representation of something in that it is not the thing itself. In this sense, it cannot help but be embodied. It can never be free of value or meaning, even when it attempts to express relationships of formation and their processes. At the same time, a diagram is neither a structure nor an abstraction of structure." (Peter Eisenman, "Written Into the Void: Selected Writings", 1990-2004, 2007)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form" (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008) 

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations:" (1) to structure data so the reader can easily pick out the information desired," (2) to display in a chart when the data contains too many variables or values, and" (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations:" (1) to structure data so the reader can easily pick out the information desired," (2) to display in a chart when the data contains too many variables or values, and" (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

"Data storytelling can be defined as a structured approach for communicating data insights using narrative elements and explanatory visuals." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Beyond the design of individual charts, the sequence of data visualizations creates grammar within the exposition. Cohesive visualizations follow common narrative structures to fully express their message. Order matters. " (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Continuity (Just the Quotes)

"In certain respects, line graphs are uniquely applicable to particular graphic requirements for which a bar or circle chart could not be substituted. Strictly speaking, the line graph must be used to portray changes in a continuous variable, since technically such a variable must be represented by a line and not by 'points' or 'bars'. Line graphs are often uniquely applicable to problems of analysis, particularly when it is essential to visualize a trend, observe the behavior of a set of variables through time, or portray the same variable in differing time periods." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Although in most cases the actual value designated by a bar is determined by the location of the end of the bar, many people associate the length or area of the bar with its value. As long as the scale is linear, starts at zero, is continuous, and the bars are the same width, this presents no problem. When any of these conditions are changed, the potential exists that the graph will be misinterpreted." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Use of a histogram should be strictly reserved for continuous numerical data or for data that can be effectively modelled as continuous […]. Unlike bar charts, therefore, the bars of a histogram corresponding to adjacent intervals should not have gaps between them, for obvious reasons." (Alan Graham, "Developing Thinking in Statistics", 2006)

"When it comes to drawing a picture of continuous data, you need to think through carefully where one interval ends and the next one begins. Failing to do this can result in overlaps or gaps between adjacent intervals, which can cause confusion." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Like a black hole or any similar rent in the warp and woof of space-time, a singularity is a disruption of continuity, a break with the past. It is a point at which everything changes, and a point beyond which we can’t see." (Scott Rosenberg, "Dreaming in Code", 2007)

"The first requirement of a beautiful visualization is that it is novel, fresh, or unique. It is difficult (though not impossible) to achieve the necessary novelty using default formats. In most situations, well-defined formats have well-defined, rational conventions of use: line graphs for continuous data, bar graphs for discrete data, pie graphs for when you are more interested in a pretty picture than conveying knowledge." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"Scatterplots are still the go-to visualization when one is examining relationships between continuous variables. One of the problems with the traditional scatterplot is that all data points are presented as if they are on equal footing. [...] Bubble maps are scatterplots with added dimensions. The most common usage is to add weight to individual data points based on population." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013)

"Broadly defined, data means events that are captured and made available for analysis. A data source is a consistent record of these events. And a data product translates this record of events into something that can easily be understood. [...] Data products can be organized and characterized by a series of continuums that describe the nature of the data and how it is presented." (Zach Gemignani et al, "Data Fluency", 2014)

"The law of continuity states that we interpret images so as not to generate abrupt transitions or otherwise create images that are more complex. […] we can arbitrarily fill in the missing elements to complete a pattern. It’s also the case of time series, in which we assume that data points in the future will be a smooth continuation of the past. […] In a line chart, those series with a similar slope (that is, they appear to follow the same direction) are understood as belonging to the same group." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution" (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

10 November 2011

📉Graphical Representation: Index Numbers (Just the Quotes)

"To a very striking degree our culture has become a Statistical culture. Even a person who may never have heard of an index number is affected [...] by [...] of those index numbers which describe the cost of living. It is impossible to understand Psychology, Sociology, Economics, Finance or a Physical Science without some general idea of the meaning of an average, of variation, of concomitance, of sampling, of how to interpret charts and tables." (Carrol D Wright, 1887)

"In any chart where index numbers are used the greatest care should be taken to select as unity a set of conditions thoroughly typical and representative. It is frequently best to take as unity the average of a series of years immediately preceding the years for which a study is to be made. The series of years averaged to represent unity should, if possible, be so selected that they will include one full cycle or wave of fluctuation. If one complete cycle involves too many years, the years selected as unity should be taken in equal number on either side of a year which represents most nearly the normal condition." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"The use of two or more amount scales for comparisons of series in which the units are unlike and, therefore, not comparable [...] generally results in an ineffective and confusing presentation which is difficult to understand and to interpret. Comparisons of this nature can be much more clearly shown by reducing the components to a comparable basis as percentages or index numbers." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"The economists, of course, have great fun - and show remarkable skill - in inventing more refined index numbers. Sometimes they use geometric averages instead of arithmetic averages" (the advantage here being that the geometric average is less upset by extreme oscillations in individual items), sometimes they use the harmonic average. But these are all refinements of the basic idea of the index number [...]" (Michael J Moroney, "Facts from Figures", 1951)

"Index numbers are today one of the most widely used statistical devices…They are used to take the pulse of the economy and they have come to be used as indicators of inflationary or deflationary tendencies." (George Simpson & Fritz Kafka, "Basic Statistics", 1952)

"The great trouble with all business data upon which the statisticians and economists base their forecasts is that they are ancient history before they ever become available. They pertain to conditions which existed some weeks or months previous. The figures for what is going on at the moment in all lines of business are never available. A business index, while of great interest and value, is always historical and never predictive." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"The fact that index numbers attempt to measure changes of items gives rise to some knotty problems. The dispersion of a group of products increases with the passage of time, principally because some items have a long-run tendency to fall while others tend to rise. Basic changes in the demand is fundamentally responsible. The averages become less and less representative as the distance from the period increases." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"A statistical index has all the potential pitfalls of any descriptive statistic - plus the distortions introduced by combining multiple indicators into a single number. By definition, any index is going to be sensitive to how it is constructed; it will be affected both by what measures go into the index and by how each of those measures is weighted." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"When using indexes in a data set, using an average aggregation is appropriate as long as you only use it at the individual region, month, and visitor type level (i.e., the lowest granularity of the data). You cannot use an average of the average to represent the total."  (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

09 November 2011

📉Graphical Representation: Inquiry (Just the Quotes)

"Statistics are numerical statements of facts in any department of inquiry, placed in relation to each other; statistical methods are devices for abbreviating and classifying the statements and making clear the relations." (Arthur L Bowley, "An Elementary Manual of Statistics", 1934)

"One of the greatest values of the graphic chart is its use in the analysis of a problem. Ordinarily, the chart brings up many questions which require careful consideration and further research before a satisfactory conclusion can be reached. A properly drawn chart gives a cross-section picture of the situation. While charts may bring out. hidden facts in tables or masses of data, they cannot take the place of careful, analysis. In fact, charts may be dangerous devices when in the hands of those unwilling to base their interpretations upon careful study. This, however, does not detract from their value when they are properly used as aids in solving statistical problems." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"The histogram, with its columns of area proportional to number, like the bar graph, is one of the most classical of statistical graphs. Its combination with a fitted bell-shaped curve has been common since the days when the Gaussian curve entered statistics. Yet as a graphical technique it really performs quite poorly. Who is there among us who can look at a histogram-fitted Gaussian combination and tell us, reliably, whether the fit is excellent, neutral, or poor? Who can tell us, when the fit is poor, of what the poorness consists? Yet these are just the sort of questions that a good graphical technique should answer at least approximately." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"There are as many types of questions as components in the information." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution." (Edward R Tufte, "Envisioning Information", 1990)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis [...] begins with a dataset in hand. Our purpose in data analysis is to learn what we can from those data, to help us draw conclusions about our broader research questions. Our research questions determine what sort of data we need in the first place, and how we ought to go about collecting them. Unless data collection has been done carefully, even a brilliant analyst may be unable to reach valid conclusions regarding the original research questions." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The final step in creating your graphic is to refine it. Step back and look at it with fresh eyes. Is there anything that could be removed? Or anything that should be removed because it is distracting? Consider each element in your figure and question whether it contributes enough to your overall goal to justify its contribution. Also consider whether there is anything that could be represented more clearly. Perhaps you have been so effective at simplifying your graphic that you could now include another point in the same figure. Another method of refinement is to check the placement and alignment of your labels. They should be unobtrusive and clearly indicate which object they refer to. Consistency in fonts and alignment of labels can make the difference between something that is easy and pleasant to read, and something that is cluttered and frustrating." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Early exploration of a dataset can be overwhelming, because you don’t know where to start. Ask questions about the data and let your curiosities guide you. […] Make multiple charts, compare all your variables, and see if there are interesting bits that are worth a closer look. Look at your data as a whole and then zoom in on categories and individual data points. […] Subcategories, the categories within categories (within categories), are often more revealing than the main categories. As you drill down, there can be higher variability and more interesting things to see." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Visual Organizations benefit from routinely visualizing many different types and sources of data. Doing so allows them to garner a better understanding of what’s happening and why. Equipped with this knowledge, employees are able to ask better questions and make better business decisions." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"What would a successful outcome look like? If you only had a limited amount of time or a single sentence to tell your audience what they need to know, what would you say? In particular, I find that these last two questions can lead to insightful conversation." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A data story starts out like any other story, with a beginning and a middle. However, the end should never be a fixed event, but rather a set of options or questions to trigger an action from the audience. Never forget that the goal of data storytelling is to encourage and energize critical thinking for business decisions." (James Richardson, 2017)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"What is the purpose of collecting data? People gather and store data for at least three different reasons that I can discern. One reason is that they want to build an arsenal of evidence with which to prove a point or defend an agenda that they already had to begin with. This path is problematic for obvious reasons, and yet we all find ourselves traveling on it from time to time. Another reason people collect data is that they want to feed it into an artificial intelligence algorithm to automate some process or carry out some task. […] A third reason is that they might be collecting data in order to compile information to help them better understand their situation, to answer questions they have in their mind, and to unearth new questions that they didn't think to ask." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"A good visualization can do more than just answer questions; it can help you see that there are other questions you need to answer." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"For a chart to be truly insightful, context is crucial because it provides us with the visual answer to an important question - 'compared with what'? No number on its own is inherently big or small – we need context to make that judgement. Common contextual comparisons in charts are provided by time" ('compared with last year...') and place" ('compared with the north...'). With ranking, context is provided by relative performance" ('compared with our rivals...')." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

📉Graphical Representation: Completeness (Just the Quotes)

"The title for any chart presenting data in the graphic form should be so clear and so complete that the chart and its title could be removed from the context and yet give all the information necessary for a complete interpretation of the data. Charts which present new or especially interesting facts are very frequently copied by many magazines. A chart with its title should be considered a unit, so that anyone wishing to make an abstract of the article in which the chart appears could safely transfer the chart and its title for use elsewhere." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"A man's judgment cannot be better than the information on which he has based it. Give him no news, or present him only with distorted and incomplete data, with ignorant, sloppy, or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning process and make him somewhat less than a man." (Arthur H Sulzberger, [speech] 1948)

"Graphical methodology provides powerful diagnostic tools for conveying properties of the fitted regression, for assessing the adequacy of the fit, and for suggesting improvements. There is seldom any prior guarantee that a hypothesized regression model will provide a good description of the mechanism that generated the data. Standard regression models carry with them many specific assumptions about the relationship between the response and explanatory variables and about the variation in the response that is not accounted for by the explanatory variables. In many applications of regression there is a substantial amount of prior knowledge that makes the assumptions plausible; in many other applications the assumptions are made as a starting point simply to get the analysis off the ground. But whatever the amount of prior knowledge, fitting regression equations is not complete until the assumptions have been examined." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Labels should be complete but succinct. Long and complicated labels will defeat the viewer and therefore the purpose of the graph. Treat a label as a cue to jog the memory or to complete comprehension. Shorten long labels; avoid abbreviations unless they are universally understood; avoid repetition on the same graph. A title, for instance, should not repeat what is already in the axis labels. Be consistent in terminology." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"[…] a graphic with loose, incomplete information that is too verbose, vague or passive can actually impede your audience’s ability to make sense of the information at hand. If the graphic confuses or frustrates the audience, you’re likely to do more harm than good, leave them with more questions than answers and essentially turn them away from your publication." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Information graphics are an essential component of technical communication. Very few technical documents or presentations can be considered complete without graphical elements to present some essential data. Because engineers are visually oriented, graphic aids allow their thoughts and ideas to be better understood by other engineers. Information graphics are essential in presenting data because they simplify the content, offer a visually pleasing alternative to gray text in a proposal or an article, and thereby invite interest." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"To become a great data analyst, you must be able to identify and deal with incomplete data and work to identify the data quality and accuracy issues in a data set." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Data storytelling is a method of communicating information that is custom-fit for a specific audience and offers a compelling narrative to prove a point, highlight a trend, make a sale, or all of the above. [...] Data storytelling combines three critical components, storytelling, data science, and visualizations, to create not just a colorful chart or graph, but a work of art that carries forth a narrative complete with a beginning, middle, and end." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

📉Graphical Representation: Failure (Just the Quotes)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Where the values of a series are such that a large part the grid would be superfluous, it is the practice to break the grid thus eliminating the unused portion of the scale, but at the same time indicating the zero line. Failure to include zero in the vertical scale is a very common omission which distorts the data and gives an erroneous visual impression." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"[…] the partial scale break is a weak indicator that the reader can fail to appreciate fully; visually the graph is still a single panel that invites the viewer to see, inappropriately, patterns between the two scales. […] The partial scale break also invites authors to connect points across the break, a poor practice indeed; […]" (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38" (4) 1984)

"When a graph is constructed, quantitative and categorical information is encoded, chiefly through position, size, symbols, and color. When a person looks at a graph, the information is visually decoded by the person's visual system. A graphical method is successful only if the decoding process is effective. No matter how clever and how technologically impressive the encoding, it is a failure if the decoding process is a failure. Informed decisions about how to encode data can be achieved only through an understanding of the visual decoding process, which is called graphical perception." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information." (Edward R Tufte, "Envisioning Information", 1990)

"Audience boredom is usually a content failure, not a decoration failure." (Edward R Tufte, "The cognitive style of PowerPoint", 2003)

"Diagrams are a means of communication and explanation, and they facilitate brainstorming. They serve these ends best if they are minimal. Comprehensive diagrams of the entire object model fail to communicate or explain; they overwhelm the reader with detail and they lack meaning." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"The Sixth Principle for the analysis and display of data: 'Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.' This suggests that the most effective way to improve a presentation is to get better content. It also suggests that design devices and gimmicks cannot salvage failed content." (Edward R Tufte, "Beautiful Evidence", 2006)

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008)

"Designing good visual displays with an easy-to-use interactive system is difficult. The designer’s first attempts will usually fail, so it is critical that proposed systems be tested on at least several sets of typical users. These usability tests help the designer iterate to the best possible system." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"To be sure, data doesn’t always need to be visualized, and many data visualizations just plain suck. Look around you. It’s not hard to find truly awful representations of information. Some work in concept but fail because they are too busy; they confuse people more than they convey information [...]. Visualization for the sake of visualization is unlikely to produce desired results - and this goes double in an era of Big Data. Bad is still bad, even and especially at a larger scale." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"The goal of using data visualization to make better and faster decisions may lead people to think that any data visualization that is not immediately understood is a failure. Yes, a good visualization should allow you to see things that you might have missed, and to glean insights faster, but you still have to think." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The rise of graphicacy and broader data literacy intersects with the technology that makes it possible and the critical need to understand information in ways current literacies fail. Like reading and writing, data literacy must become mainstream to fully democratize information access." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"A perfectly relevant visualization that breaks a few presentation rules is far more valuable - it’s better - than a perfectly executed, beautiful chart that contains the wrong data, communicates the wrong message, or fails to engage its audience. [...] The more relevant a data visualization is to its context, the more forgiving, to a point, we can be about its execution" (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.