SQL Troubles: 📉Graphical Representation: Mistakes (Just the Quotes)

13 November 2011

📉Graphical Representation: Mistakes (Just the Quotes)

"Many people imagine that graphic charts cannot be understood except by expert mathematicians who have devoted years of study to the subject. This is a mistaken idea, and if instead of passing over charts as if they were something beyond their comprehension more people would make an effort to read them, much valuable time would be saved. It is true that some charts covering technical data are difficult even for an expert mathematician to understand, but this is more often the fault of the person preparing the charts than of the system." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Readers of statistical diagrams should not be required to compare magnitudes in more than one dimension. Visual comparisons of areas are particularly inaccurate and should not be necessary in reading any statistical graphical diagram." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"The art of using the language of figures correctly is not to be over-impressed by the apparent air of accuracy, and yet to be able to take account of error and inaccuracy in such a way as to know when, and when not, to use the figures. This is a matter of skill, judgment, and experience, and there are no rules and short cuts in acquiring this expertness." (Ely Devons, "Essays in Economics", 1961)

"Then there is the audience: will those looking at the new designs be confused? Some of the designs are selfexplanatory, as in the case of the range-frame. The dot-dash-plot is more difficult, although it still shows all the standard information found in the scatterplot. Nothing is lost to those puzzled by the frame of dashes, and something is gained by those who do understand. Moreover, it is a frequent mistake in thinking about statistical graphics to underestimate the audience. Instead, why not assume that if you understand it, most other readers will, too? Graphics should be as intelligent and sophisticated as the accompanying text." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"Exploratory regression methods attempt to reveal unexpected patterns, so they are ideal for a first look at the data. Unlike other regression techniques, they do not require that we specify a particular model beforehand. Thus exploratory techniques warn against mistakenly fitting a linear model when the relation is curved, a waxing curve when the relation is S-shaped, and so forth." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Many of the applications of visualization in this book give the impression that data analysis consists of an orderly progression of exploratory graphs, fitting, and visualization of fits and residuals. Coherence of discussion and limited space necessitate a presentation that appears to imply this. Real life is usually quite different. There are blind alleys. There are mistaken actions. There are effects missed until the very end when some visualization saves the day. And worse, there is the possibility of the nearly unmentionable: missed effects." (William S Cleveland, "Visualizing Data", 1993)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it isa mistake – maybe due to a measuring or recording error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Histograms are often mistaken for bar charts but there are important differences. Histograms show distribution through the frequency of quantitative values (y axis) against defined intervals of quantitative values(x axis). By contrast, bar charts facilitate comparison of categorical values. One of the distinguishing features of a histogram is the lack of gaps between the bars [...]" (Andy Kirk, "Data Visualization: A successful design process", 2012)

"A common mistake is that all visualization must be simple, but this skips a step. You should actually design graphics that lend clarity, and that clarity can make a chart 'simple' to read. However, sometimes a dataset is complex, so the visualization must be complex. The visualization might still work if it provides useful insights that you wouldn’t get from a spreadsheet. […] Sometimes a table is better. Sometimes it’s better to show numbers instead of abstract them with shapes. Sometimes you have a lot of data, and it makes more sense to visualize a simple aggregate than it does to show every data point." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Most data is linked to time in some way in that it might be a time series, or it’s a snapshot from a specific period. In both cases, you have to know when the data was collected. An estimate made decades ago does not equate to one in the present. This seems obvious, but it’s a common mistake to take old data and pass it off as new because it’s what’s available. Things change, people change, and places change, and so naturally, data changes." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"It’s a mistake to think of data and data visualizations as static terms. They are the very antitheses of stasis." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Most discussions of decision making assume that only senior executives make decisions or that only senior executives' decisions matter. This is a dangerous mistake. Decisions are made at every level of the organization, beginning with individual professional contributors and frontline supervisors. These apparently low-level decisions are extremely important in a knowledge-based organization." (Zach Gemignani et al, "Data Fluency", 2014)

"The most common mistake in ineffective data products is an inability to make difficult decisions about what information is most important. [...] Often information gets included in data products for reasons that are superfluous to the purpose, audience, and message - reasons that cater the product to someone influential or use information that has been included historically. The bar should be higher." (Zach Gemignani et al, "Data Fluency", 2014)

"Sometimes bar charts are avoided because they are common. This is a mistake. Rather, bar charts should be leveraged because they are common, as this means less of a learning curve for your audience." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"There are two kinds of mistakes that an inappropriate inductive bias can lead to: underfitting and overfitting. Underfitting occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target feature. Overfitting, by contrast, occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data."(John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"The expressiveness principle dictates that the visual encoding should express all of, and only, the information in the dataset attributes. The most fundamental expression of this principle is that ordered data should be shown in a way that our perceptual system intrinsically senses as ordered. Conversely, unordered data should not be shown in a way that perceptually implies an ordering that does not exist. Violating this principle is a common beginner’s mistake in vis. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"A common misconception is that data scientists don’t need visualizations. This attitude is not only inaccurate: it is very dangerous. Most machine learning algorithms are not inherently visual, but it is very easy to misinterpret their outputs if you look only at the numbers; there is no substitute for the human eye when it comes to making intuitive sense of things." (Field Cady, "The Data Science Handbook", 2017)

"In statistics, 'error' is not a synonym for 'mistake', but rather a synonym for 'uncertainty.' Error means that any estimate we make, no matter how precise it looks in our chart or article [...] is usually a middle point of a range of possible values." (Alberto Cairo, "How Charts Lie", 2019)

"Numbers can always yield multiple interpretations, and they may be approached from varied angles. We journalists don’t vary our approaches more often because many of us are sloppy, innumerate, or simply forced to publish stories at a quick pace. That’s why chart readers must remain vigilant. Even the most honest chart creator makes mistakes." (Alberto Cairo, "How Charts Lie", 2019)

SQL Troubles

Pages

13 November 2011

📉Graphical Representation: Mistakes (Just the Quotes)

No comments:

About Me