05 December 2011

📉Graphical Representation: Venn Diagrams (Just the Quotes)

"[...] for merely theoretical purposes the rule of formation would be very simple. It would merely be to begin by drawing any closed figure, and then proceed [sic] to draw others, subject to the one condition that each is to intersect once and once only all the existing subdivisions produced by those which had gone before." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"[…] it must be noticed that these diagrams do not naturally harmonize with the propositions of ordinary life or ordinary logic. […] The great bulk of the propositions which we commonly meet with are founded, and rightly founded, on an imperfect knowledge of the actual mutual relations of the implied classes to one another. […] one very marked characteristic about these circular diagrams is that they forbid the natural expression of such uncertainty, and are therefore only directly applicable to a very small number of such propositions as we commonly meet with." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"[...] we can not readily break up a complicated problem into successive steps which can be taken independently. We have, in fact, to solve the problem first, by determining what are the actual mutual relations of the classes involved, and then to draw the circles to represent this final result; we cannot work step-by-step towards the conclusion by aid of our figures." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"Whereas the Eulerian plan endeavoured at once and directly to represent propositions, or relations of class terms to one another, we shall find it best to begin by representing only classes, and then proceed to modify these in some way so as to make them indicate what our propositions have to say. How, then, shall we represent all the subclasses which two or more class terms can produce? Bear in mind that what we have to indicate is the successive duplication of the number of subdivisions produced by the introduction of each successive term. and we shall see our way to a very important departure from the Eulerian conception. All that we have to do is to draw our figures, say circles, so that each successive one which we introduce shall intersect once, and once only, all the subdivisions already existing, and we then have what may be called a general framework indicating every possible combination producible by the given class terms." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"We endeavour to employ only symmetrical figures, such as should not only be an aid to reasoning, through the sense of sight, but should also be to some extent elegant in themselves." (John Venn, "Symbolic Logic", 1881)

"At the basis of our Symbolic Logic, however represented, whether by words by letters or by diagrams, we shall always find the same state of things. What we ultimately have to do is to break up the entire field before us into a definite number of classes or compartments which are mutually exclusive and collectively exhaustive." (John Venn, "Symbolic Logic" 2nd Ed., 1894)

"The best way of introducing this question will be to enquire a little more strictly whether it is really classes that we thus represent, or merely compartments into which classes may be put? […] The most accurate answer is that our diagrammatic subdivisions, or for that matter our symbols generally, stand for compartments and not for classes. We may doubtless regard them as representing the latter, but if we do so we should never fail to keep in mind the proviso, 'if there be such things in existence'. And when this condition is insisted upon, it seems as if we expressed our meaning best by saying that what our symbols stand for are compartments which may or may not happen to be occupied." (John Venn, "Symbolic Logic" 2nd Ed., 1894)

"A Venn diagram is a simple representation of the sample space, that is often helpful in seeing 'what is going on'. Usually the sample space is represented by a rectangle, with individual regions within the rectangle representing events. It is often helpful to imagine that the actual areas of the various regions in a Venn diagram are in proportion to the corresponding probabilities. However, there is no need to spend a long time drawing these diagrams - their use is simply as a reminder of what is happening." (Graham Upton & Ian Cook, "Introducing Statistics", 2001)

"Two types of graphic organizers are commonly used for comparison: the Venn diagram and the comparison matrix [...] the Venn diagram provides students with a visual display of the similarities and differences between two items. The similarities between elements are listed in the intersection between the two circles. The differences are listed in the parts of each circle that do not intersect. Ideally, a new Venn diagram should be completed for each characteristic so that students can easily see how similar and different the elements are for each characteristic used in the comparison." (Robert J. Marzano et al, "Classroom Instruction that Works: Research-based strategies for increasing student achievement, 2001)

"The notion of outcomes covering a space is a very useful mental image, as it ties in strongly with the use of Venn diagrams and tables for clarifying the nature of possible events resulting from a trial. There are two important aspects to this. First, when enumerating the various outcomes that comprise an event, the number of (equally. likely) outcomes should correspond, visually, with the area of that part of the diagram represented by the event in question - the greater the probability, the larger the area. Secondly, where events overlap (for example, when rolling a die, consider the two events 'getting an even score' and 'getting a score greater than 2' ), the various regions in the Venn diagram help to clarify the various combinations of events that might occur." (Alan Graham, "Developing Thinking in Statistics", 2006)

📉Graphical Representation: Tools (Just the Quotes)

"Recognize effective results. Does the type of chart selected give a comprehensive picture of the situation? Does the size of chart and visual aid used satisfy all audience requirements? Do materials meet all reproduction problems? Is the layout well balanced and style of lettering uniform? Does the chart as a whole accurately present the facts? Is the projected idea an effective visual tool?" (Mary E Spear, "Charting Statistics", 1952)

"The grid with the vertical ruling carrying the logarithmic scale and the horizontal ruling carrying the arithmetic scale denoting time is the most common. The reverse may be used, and the horizontal ruling may carry the log scale. Charts of this type are frequently referred to as 'semilog charts'. [...] The full or double log scale (with the log grid carried on both horizontal and vertical rulings) is used mostly for statistical study and economic analysis and is not a good tool for popular presentation of data." (Mary E Spear, "Charting Statistics", 1952)

"Graphic forms help us to perform and influence two critical functions of the mind: the gathering of information and the processing of that information. Graphs and charts are ways to increase the effectiveness and the efficiency of transmitting information in a way that enhances the reader's ability to process that information. Graphics are tools to help give meaning to information because they go beyond the provision of information and show relationships, trends, and comparisons. They help to distinguish which numbers and which ideas are more important than others in a presentation." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"The square has always had a no-nonsense sort of image. Stable, solid, and - well - square. Perhaps that's why it is the shape used in business visuals in those rare cases where a visual is even bothered with. Flip through most business books and you'll find precious few places for your eye to stop and your visual brain to engage. But when you do, the shape of the graphic, chart, matrix, table, or diagram is certainly square. It's a comfortable shape, which makes it a valuable implement in your kit of visual communication tools." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"The triangle is one of the best tools for visualizing a problem. Every difficult problem I've encountered in business breaks down into pieces, which carry different weight and importance. The pieces with the most importance sit at the top of the triangle, which progresses down to the sometimes thorny but less important piece at the base." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"Visual thinking can begin with the three basic shapes we all learned to draw before kindergarten: the triangle, the circle, and the square. The triangle encourages you to rank parts of a problem by priority. When drawn into a triangle, these parts are less likely to get out of order and take on more importance than they should. While the triangle ranks, the circle encloses and can be used to include and/or exclude. Some problems have to be enclosed to be managed. Finally, the square serves as a versatile problem-solving tool. By assigning it attributes along its sides or corners, we can suddenly give a vague issue a specific place to live and to move about." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Good numeric representation is a key to effective thinking that is not limited to understanding risks. Natural languages show the traces of various attempts at finding a proper representation of numbers. [...] The key role of representation in thinking is often downplayed because of an ideal of rationality that dictates that whenever two statements are mathematically or logically the same, representing them in different forms should not matter. Evidence that it does matter is regarded as a sign of human irrationality. This view ignores the fact that finding a good representation is an indispensable part of problem solving and that playing with different representations is a tool of creative thinking." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"To analyze means to untangle. Even when we 'let the data speak for themselves', we need to untangle some aspect of the data before displaying things in a graphic. The more analytics we can include in the process of displaying graphics, the more flexibility our tools will have." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Graphics, charts, and maps aren’t just tools to be seen, but to be read and scrutinized. The first goal of an infographic is not to be beautiful just for the sake of eye appeal, but, above all, to be understandable first, and beautiful after that; or to be beautiful thanks to its exquisite functionality." (Alberto Cairo, "The Functional Art", 2011)

"The first and main goal of any graphic and visualization is to be a tool for your eyes and brain to perceive what lies beyond their natural reach." (Alberto Cairo, "The Functional Art", 2011)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Key Performance Indicators (KPIs) in many organizations are a broken tool. The KPIs are often a random collection prepared with little expertise, signifying nothing. [...] KPIs should be measures that link daily activities to the organization’s critical success factors (CSFs), thus supporting an alignment of effort within the organization in the intended direction." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"There is a story in your data. But your tools don’t know what that story is. That’s where it takes you - the analyst or communicator of the information - to bring that story visually and contextually to life." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Commonly, data do not make a clear and unambiguous statement about our world, often requiring tools and methods to provide such clarity. These methods, called statistical data analysis, involve collecting, manipulating, analyzing, interpreting, and presenting data in a form that can be used, understood, and communicated to others." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Exploring data generates hypotheses about patterns in our data. The visualizations and tools of dynamic interactive graphics ease and improve the exploration, helping us to 'see what our data seem to say'." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"Color is difficult to use effectively. A small number of well-chosen colors can be highly distinguishable, particularly for categorical data, but it can be difficult for users to distinguish between more than a handful of colors in a visualization. Nonetheless, color is an invaluable tool in the visualization toolbox because it is a channel that can carry a great deal of meaning and be overlaid on other dimensions. […] There are a variety of perceptual effects, such as simultaneous contrast and color deficiencies, that make precise numerical judgments about a color scale difficult, if not impossible." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Maps also have the disadvantage that they consume the most powerful encoding channels in the visualization toolbox - position and size - on an aspect that is held constant. This leaves less effective encoding channels like color for showing the dimension of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

04 December 2011

📉Graphical Representations: Dashboards (Just the Quotes)

"The real value of dashboard products lies in their ability to replace hunt‐and‐peck data‐gathering techniques with a tireless, adaptable, information‐flow mechanism. Dashboards transform data repositories into consumable information." (Gregory L Hovis, "Stop Searching for InformationMonitor it with Dashboard Technology," DM Direct, 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

“Dashboards aren't all that different from some of the other means of presenting information, but when properly designed the single-screen display of integrated and finely tuned data can deliver insight in an especially powerful way.” (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication." (Stephen Few, "Information Dashboard Design", 2006)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006) 

"Having a purposeless or poorly performing dashboard is more common than not. This happens when the underlying architecture is not designed properly to support the needs of dashboard interaction. There is an obvious disconnect between the design of the data warehouse and the design of the dashboards. The people who design the data warehouse do not know what the dashboard will do; and the people who design the dashboards do not know how the data warehouse was designed, resulting in a lack of cohesion between the two. A similar disconnect can also exist between the dashboard designer and the business analyst, resulting in a dashboard that may look beautiful and dazzling but brings very little business value." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"In general, it still holds true that 'there is no such thing as a free lunch'. What this means is that the most advanced dashboard solutions with the most features and flexibility are generally also the technologies that require more setup and more skill sets from the administrators and the end users. In some cases companies 'dumb down' their dashboard application in the initial stages of deployment so as not to scare their users with too many options. Later, when a dashboard culture has developed, they open up more of the functionality." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"A dashboard is like the executive summary of a report. We read executive summaries and skip the body of the report if the summary is more or less in line with our expectations. Trouble is, measurement is never exhaustive. It is only when we dive in that we realize what areas may have been missed." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"[…] an overall green status indicator doesn’t mean anything most of the time. All it says is that the things under measurement seem okay. But there always will be many more things not under measurement. To celebrate green indicators is to ignore the unknowns. […] The tendency to roll up metrics into dashboards promotes ignorance of the real situation on the ground. We forget that we only see what is under measurement. We only act when something is not green." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Rolling up fine-grained metrics to create high-level dashboards puts pressure on teams to keep the fine-grained metrics green even when it might not be the best use of their time." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"All human storytellers bring their subjectivity to their narratives. All have bias, and possibly error. Acknowledging and defusing that bias is a vital part of successfully using data stories. By debating a data story collaboratively and subjecting it to critical thinking, organizations can get much higher levels of engagement with data and analytics and impact their decision making much more than with reports and dashboards alone." (James Richardson, 2017)

"Dashboards are a type of multiform visualization used to summarize and monitor data. These are most useful when proxies have been well validated and the task is well understood. This design pattern brings a number of carefully selected attributes together for fast, and often continuous, monitoring - dashboards are often linked to updating data streams. While many allow interactivity for further investigation, they typically do not depend on it. Dashboards are often used for presenting and monitoring data and are typically designed for at-a-glance analysis rather than deep exploration and analysis." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Infographics combine art and science to produce something that is not unlike a dashboard. The main difference from a dashboard is the subjective data and the narrative or story, which enhances the data-driven visual and engages the audience quickly through highlighting the required context." (Travis Murphy, "Infographics Powered by SAS®: Data Visualization Techniques for Business Reporting", 2018)

"Dashboards are collections of several linked visualizations all in one place. The idea is very popular as part of business intelligence: having current data on activity summarized and presented all inone place. One danger of cramming a lot of disparate information into one place is that you will quickly hit information overload. Interactivity and small multiples are definitely worth considering as ways of simplifying the information a reader has to digest in a dashboard. As with so many other visualizations, layering the detail for different readers is valuable." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"[Dashboards] are popular methods for displaying multiple visualizations and statistical information. Dashboards often take the form of some organizational instrument that offers both at-a-glance and detailed views of many different analytical and information dimensions. Dashboards are not a unique chart type themselves, but rather should be considered compositions that comprise multiple chart types." (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

"Understanding the entire data ecosystem, from the production of a data point to its consumption in a dashboard or a visualization, provides the ability to invoke action, which is more valuable than the mere sum of its parts." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"A well-designed dashboard needs to provide a similar experience; information cannot be placed just anywhere on the dashboard. Charts that relate to one another are usually positioned close to one another. Important charts often appear larger and more visually prominent than less important ones. In other words, there are natural sizes for how a dashboard comprises charts based on the task and context." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"As we enter into certain types of analytical conversations, we expect the conversations to flow in a predictable and cohesive manner. A KPI dashboard, for example, uses redundant structures across specific dimensions or measures to convey information. A dashboard with a top-down exposition style provides high-level information first and clarifies downward, while a bottom-up dashboard starts with the details and clarifies them against the larger picture." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Chart choices can also create weight within the entire composition. Presenting information as a comprehensive visualization, such as in a dashboard, requires thinking beyond individual charts. In writing, we not only craft sentences, but write the composition as an entire piece. Certain sentences may drive the writing more, but all sentences play a role in conveying the message." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"The sizes of charts in space reflect how we convey information to a reader. In a dashboard context, the content, size, and space that the various charts occupy should reflect the form and function of the main message. As you saw with the bento box metaphor from the introduction, there needs to be deliberate thought put into the placement and size of each individual chart so that they all work together in harmony." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"When integrating written text with charts in a functionally aesthetic way, the reader should be able to find the key takeaways from the chart or dashboard, taking into account the context, constraints, and reading objectives of the overall message."  (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Information Design (Just the Quotes)

"The ducks of information design are false escapes from flatland, adding pretend dimensions to impoverished data sets, merely fooling around with information." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"Good information design is clear thinking made visible, while bad design is stupidity in action." (Edward Tufte, "Visual Explanations" , 1997)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"Information design is defined as the art and science of preparing information so that can be used by human beings with efficiency and effectiveness. Its primary objectives are:To develop documents that are comprehensible, rapidly and accurately retrievable, and easy to translate into effective actions [...]" (Sheila Pontis, "La historia de la esquematica en la visualization de datos", 2007)

"I feel that every day, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless; it literally pours in." (David McCandless, "The beauty of data visualization", TEDGlobal, 2010) 

"The composing of intelligible patterns from the noise of raw data is a hallmark of a good information designer. The most successful examples extract and present essential relationships in a coherent manner while limiting the obtrusiveness of accessory relationships. Effective results are self-evident whereby the information graphic is absorbed by the mind holistically." (William A Anderson & William M Bevington, "Complications and Adjacencies: An Organizing Logic for Information Graphics", Parsons Journal of Information Mapping Vol. II(3), 2010)

"Information design, when successful - whether in print, on the web, or in the environment - represents the functional balance of the meaning of the information, the skills and inclinations of the designer, and the perceptions, education, experience, and needs of the audience." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Successful information design in movement systems gives the user the information he needs - and only the information he needs - at every decision point." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012) 

"Information design is a design practice concerned with the presentation of information. It is often associated with the activities of data visualization; indeed sometimes it is presented as the major field in which data visualization belongs. Unquestionably, both share an underlying motive to facilitate understanding. However, in my view, information design has a much broader application concerned with the design of many different forms of visual communication, particularly those with an instructional or functional slant, such as way-finding devices like hospital building maps or in the design of utility bills." (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

03 December 2011

📉Graphical Representation: Charts vs. Thousand Words (Just the Quotes)

"The drawing shows me at a glance what would be spread over ten pages in a book." (Ivan Turgenev, 1862) [2]

"Sometimes, half a dozen figures will reveal, as with a lighting-flash, the importance of a subject which ten thousand labored words with the same purpose in view, had left at last but dim and uncertain." (Mark Twain, "Life on the Mississippi", 1883) 

"One good picture is worth many pages of written description." (William Sproston Caine, 1891) [2]

"One look is worth a thousand words" (Kathleen Caffyn, 1903) 

"Use a picture. It's worth a thousand words." (Arthur Brisbane, The Post-Standard, 1911)

"One Look Is Worth A Thousand Words" ([advertisement] 1913)

"A picture is worth ten thousand words. If you can’t see the truth in these pictures you are among the vast majority that must learn only by experience." (Arthur Brisbane, 1915)

"One picture is worth ten thousand words." (Frederick R Barnard, Printer’s Ink, 1921)

"One Picture Worth Ten Thousand Words" ([Chinese proverb] 1927)

"In many instances, a picture is indeed worth a thousand words. To make this true in more diverse circumstances, much more creative effort is needed to pictorialize the output from data analysis. Naive pictures are often extremely helpful, but more sophisticated pictures can be both simple and even more informative." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Graphic charts are ways of presenting quantitative as well as qualitative information in an efficient and effective visual form. Numbers and ideas presented graphically are often more easily understood. remembered. and integrated than when they are presented in narrative or tabular form. Descriptions. trends. relationships, and comparisons can be made more apparent. Less time is required to present and comprehend information when graphic methods are employed. As the old truism states, 'One picture is worth a thousand words.'" (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"One word is worth a thousand pictures. If it's the right word." (Edward Abbey, "Beyond the Wall: Essays from the Outside", 1984)

"A picture may be worth a thousand words, a formula is worth a thousand pictures." (Edsger Dijkstra, [conference at ETH Zurich] 1994)

"A magnificent picture is never worth a thousand perfect words." (John Dunning, "The Bookman's Wake", 1995)

"A picture tells a thousand words. But you get a thousand pictures from someone's voice." (Paul Fleischman, "Seek", 2001)

"If a picture is worth a thousand words, a metaphor is worth a thousand pictures." (Daniel H Pink, "A Whole New Mind: Why Right-Brainers Will Rule the Future", 2005)

"The amount of information rendered in a single financial graph is easily equivalent to thousands of words of text or a page-sized table of raw values. A graph illustrates so many characteristics of data in a much smaller space than any other means. Charts also allow us to tell a story in a quick and easy way that words cannot." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Visual reports exploit the idea that a picture is worth a thousand words and, in particular, for many tasks a picture is more useful than a large table of numbers." (Stephen G Eick, "Graph Drawing for Data Analytics" [in "Handbook of Graph Drawing and Visualization"] , 2013)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"The caption should explain what is shown, possibly also giving the data source. Captions should be detailed enough that the graphic can pretty well stand on its own. Longer is usually better than shorter. A picture may be worth a thousand words, but you need at least some words to describe and explain it." (Antony Unwin, "Graphical Data Analysis with R", 2015)

"A picture may be worth a thousand words, but not all pictures are readable, interpretable, meaningful, or relevant." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"'A picture is worth a thousand words' is definitely true, and graphs can help you tell a story about your data that would otherwise go untold with only numerical summaries and statistics. While inferential statistics and effect size measures can help us draw relatively reliable conclusions from our data, graphs and visualizations can help make the scientific findings accessible to virtually anyone, even with minimal coursework in statistics or data science." (Daniel J Denis, "Univariate, Bivariate, and Multivariate Statistics Using R: Quantitative Tools for Data Analysis and Data Science, 2020)

"Although a picture may be worth a thousand words, a single static picture is in most cases insufficient for a valid analysis and for understanding of a complex subject. It is usual that an analyst needs to see different aspects or parts of data and look at the data from different perspectives. This means that the analyst needs to interact with the data and with the system that generates visual displays of the data: select data components and subsets for viewing, select and tune visualization techniques, transform the views, transform the data, and so on." (Natalia Andrienko et al, "Visual Analytics for Data Scientists", 2020)

"A picture really can be worth a thousand words, and human beings are adept at extracting useful information from visual presentations. Modern data analysis increasingly relies on graphical presentations to uncover meaning and convey results." (Robert I Kabacoff, "R in Action: Data analysis and graphics with R and Tidyverse", 2022)

"A good metaphor is worth a thousand pictures." (Anon) 

"As the Chinese say, 1001 words is worth more than a picture." (John McCarthy [source]) 

References:
[1] Wikipedia (2024) A picture is worth a thousand words [link]
[2] Quote Investigator (2022) A Picture Is Worth Ten Thousand Words [link


💠SQL Server: Window Functions [new feature]

Introduction

     In the past, in the absence or in parallel with other techniques, aggregate functions proved to be quite useful in order to solve several types of problems that involve the retrieval of first/last record or the display of details together with averages and other aggregates. Typically their use involves two or more joins between a dataset and an aggregation based on the same dataset or a subset of it. An aggregation can involve one or more columns that make the object of analysis. Sometimes it might be needed multiple such aggregations based on different sets of columns. Each such aggregation involves at least a join. Such queries can become quite complex, though they were a price to pay in order to solve such problems.

Partitions

     The introduction of analytic functions in Oracle and of window functions, a similar concept, in SQL Server, allowed the approach of such problems from a different simplified perspective. Central to this feature it’s the partition (of a dataset), its meaning being same as of mathematical partition of a set, defined as a division of a set into non-overlapping and non-empty parts that cover the whole initial set. The introduction of partitions it’s not necessarily something new, as the columns used in a GROUP BY clause determines (implicitly) a partition in a dataset. The difference in analytic/window functions is that the partition is defined explicitly inline together with a ranking or average function evaluated within a partition. If the concept of partition is difficult to grasp, let’s look at the result-set based on two Products (the examples are based on AdventureWorks database):
 
-- Price Details for 2 Products 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost  
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
ORDER BY A.ProductID 
, A.StartDate 

window function - details

   In this case a partition is “created” based on the first Product (ProductId = 707), while a second partition is based on the second Product (ProductId = 708). As a parenthesis, another partitioning could be created based on ProductId and StartDate; considering that the two attributes are a key in the table, this will partition the dataset in partitions of 1 record (each partition will have exactly one record).

Details and Averages

     In order to exemplify the use of simple versus window aggregate functions, let’s consider a problem in which is needed to display Standard Price details together with the Average Standard Price for each ProductId. When a GROUP BY clause is applied in order to retrieve the Average Standard Cost, the query is written under the form: 

-- Average Price for 2 Products 
SELECT A.ProductID  
, AVG(A.StandardCost) AverageStandardCost 
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
GROUPBY A.ProductID  
ORDERBY A.ProductID 

window function - GROUP BY 

    In order to retrieve the details, the query can be written with the help of a FULL JOIN as follows:

-- Price Details with Average Price for 2 Products - using JOINs 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
, B.AverageStandardCost 
, A.StandardCost - B.AverageStandardCost DiffStandardCost 
FROM [Production].[ProductCostHistory] A    
  JOIN ( -- average price        
    SELECT A.ProductID         
    , AVG(A.StandardCost) AverageStandardCost         
    FROM [Production].[ProductCostHistory] A        
    WHERE A.ProductID IN (707, 708)        
    GROUP BY A.ProductID      
) B  
    ON A.ProductID = B.ProductID 
WHERE A.ProductID IN (707, 708) 
ORDERBY A.ProductID 
, A.StartDate 

 window function - Average Price JOIN   

    As pointed above the partition is defined by ProductId. The same query written with window functions becomes:

-- Price Details with Average Price for 2 Products - using AVG window function 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
, AVG(A.StandardCost) OVER(PARTITION BY A.ProductID) AverageStandardCost 
, A.StandardCost - AVG(A.StandardCost) OVER(PARTITION BY A.ProductID) DiffStandardCost 
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
ORDER BY A.ProductID 
, A.StartDate 

window function - Average Price WF









    As can be seen, in the second example, the AVG function is defined using the OVER clause with PartitionId as partition. Even more, the function is used in a formula to calculate the Difference Standard Cost. More complex formulas can be written making use of multiple window functions.  

The Last Record

     Let’s consider the problem of retrieving the nth record. Because with aggregate functions is easier to retrieve the first or last record, let’s consider that is needed to retrieve the last Standard Price for each ProductId. The aggregate function helps to retrieve the greatest Start Date, which farther helps to retrieve the record containing the Last Standard Price.

-- Last Price Details for 2 Products - using JOINs 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
FROM [Production].[ProductCostHistory] A  
    JOIN ( -- average price          
    SELECT A.ProductID          
    , Max(A.StartDate) LastStartDate          
    FROM [Production].[ProductCostHistory] A          
    WHERE A.ProductID IN (707, 708)          
    GROUP BY A.ProductID      
) B      
   ON A.ProductID = B.ProductID  
  AND A.StartDate = B.LastStartDate 
WHERE A.ProductID IN (707, 708) 
ORDERBY A.ProductID 
,A.StartDate 

window function - Last Price JOIN  

With window functions the query can be rewritten as follows:

-- Last Price Details for 2 Products - using AVG window function 
SELECT * 
FROM (-- ordered prices      
    SELECT A.ProductID      
    , A.StartDate      
    , A.EndDate      
    , A.StandardCost      
    , RANK() OVER(PARTITION BY A.ProductID ORDER BY A.StartDate DESC) Ranking      
    FROM [Production].[ProductCostHistory] A     
    WHERE A.ProductID IN (707, 708) 
  ) A 
WHERE Ranking = 1 
ORDER BY A.ProductID 
, A.StartDate 

window function - Last Price WF  

   As can be seen, in order to retrieve the Last Standard Price, was considered the RANK function, the results being ordered descending by StartDate. Thus, the Last Standard Price will be always positioned on the first record. Because window functions can’t be used in WHERE clauses, it’s needed to encapsulate the initial logic in a subquery. Similarly could be retrieved the First Standard Price, this time ordering ascending the StartDate. The last query can be easily modified to retrieve the nth records (this can prove to be more difficult with simple average functions), the first/last nth records.

Conclusion

    Without going too deep into details, I shown above two representative scenarios in which solutions based on average functions could be simplified by using window functions. In theory the window functions provide greater flexibility but they have their own trade offs too. In the next posts I will attempt to further detail their use, especially in the context of Statistics.

02 December 2011

📉Graphical Representation: Tables (Just the Quotes)

"Information that is imperfectly acquired, is generally as imperfectly retained; and a man who has carefully investigated a printed table, finds, when done, that he has only a very faint and partial idea of what he has read; and that like a figure imprinted on sand, is soon totally erased and defaced." (William Playfair, "The Commercial and Political Atlas", 1786)

"In the course of executing that design, it occurred to me that tables are by no means a good form for conveying such information. [...] Making an appeal to the eye when proportion and magnitude are concerned is the best and readiest method of conveying a distinct idea." (William Playfair, "The Statistical Brewery", 1801)

"Isolated facts, those that can only be obtained by rough estimate and that require development, can only be presented in memoires; but those that can be presented in a body, with details, and on whose accuracy one can rely, may be expounded in tables." (Emmanuel Duvillard, "Memoire sur le travail du Bureau de statistique", 1806)

"Tables are like cobwebs, like the sieve of Danaides; beautifully reticulated, orderly to look upon, but which will hold no conclusion. Tables are abstractions, and the object a most concrete one, so difficult to read the essence of." (Thomas Carlyle, "Chartism", 1840)

"But law is no explanation of anything; law is simply a generalization, a category of facts. Law is neither a cause, nor a reason, nor a power, nor a coercive force. It is nothing but a general formula, a statistical table." (Florence Nightingale, "Suggestions for Thought", 1860)

"The dominant principle which characterizes my graphic tables and my figurative maps is to make immediately appreciable to the eye, as much as possible, the proportions of numeric results. […] Not only do my maps speak, but even more, they count, they calculate by the eye." (Chatles D Minard, "Des tableaux graphiques et des cartes figuratives", 1862) 

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondences or anomalies, to find the causes, to identify the laws." (Émile Cheysson, cca. 1877)

"That the ten digits do not occur with equal frequency must be evident to any one making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones." (Simon Newcomb, "Note on the frequencies of the different digits in natural numbers", Amer. J. Math 4, 1881)

"To a very striking degree our culture has become a Statistical culture. Even a person who may never have heard of an index number is affected [...] by [...] of those index numbers which describe the cost of living. It is impossible to understand Psychology, Sociology, Economics, Finance or a Physical Science without some general idea of the meaning of an average, of variation, of concomitance, of sampling, of how to interpret charts and tables." (Carrol D Wright, 1887)

"Getting information from a table is like extracting sunlight from a cucumber." (Arthur B. Farquhar & Henry Farquhar, "Economic and Industrial Delusions", 1891)

"The graphical method has considerable superiority for the exposition of statistical facts over the tabular. A heavy bank of figures is grievously wearisome to the eye, and the popular mind is as incapable of drawing any useful lessons from it as of extracting sunbeams from cucumbers." (Arthur B Farquhar & Henry Farquhar, "Economic and Industrial Delusions", 1891)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Since a table is a collection of certain sets of data, a chart with one curve representing each set of data can be made to take the place of the table. Wherever a chart can be plotted by straight lines, the speed of this is infinitely greater than making out a table, and where the curvilinear law is known, or can be approximated by the use of the empiric law, the speed is but little less." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Although, the tabular arrangement is the fundamental form for presenting a statistical series, a graphic representation - in a chart or diagram - is often of great aid in the study and reporting of statistical facts. Moreover, sometimes statistical data must be taken, in their sources, from graphic rather than tabular records." (William L Crum et al, "Introduction to Economic Statistics", 1938)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"We must emphasize that such terms as 'select at random', 'choose at random', and the like, always mean that some mechanical device, such as coins, cards, dice, or tables of random numbers, is used." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"A statistical table is the logical listing of related quantitative data in vertical columns and horizontal rows of numbers with sufficient explanatory and qualifying words, phrases and statements in the form of titles, headings and notes to make clear the full meaning of data and their origin." (Alva M Tuttle, "Elementary Business and Economic Statistics", 1957)

"However informative and well designed a statistical table may be, as a medium for conveying to the reader an immediate and clear impression of its content, it is inferior to a good chart or graph. Many people are incapable of comprehending large masses of information presented in tabular form; the figures merely confuse them. Furthermore, many such people are unwilling to make the effort to grasp the meaning of such data. Graphs and charts come into their own as a means of conveying information in easily comprehensible form." (Alfred R Ilersic, "Statistics", 1959)

"All the evidence obtained from the reproduction of the study mentioned here indicates that the graphic method is 'better' than the tabular. Tables, since graphs are based on them, are necessary, but they are like background rocks, heavy and uninteresting. Graphs, on the other hand, spice the reports; clarify them, and make them interesting and palatable." (Karl M Dallenbach, 1963)

"The statistician has no magic touch by which he may come in at the stage of tabulation and make something of nothing. Neither will his advice, however wise in the early stages of a study, ensure successful execution and conclusion. Many a study, launched on the ways of elegant statistical design, later boggled in execution, ends up with results to which the theory of probability can contribute little." (W Edwards Deming, "Principles of Professional Statistical Practice", Annals of Mathematical Statistics, 36(6), 1965)

"The problem that still remains to be solved is that of the orderable matrix, that needs the use of imagination […] When the two components of a data table are orderable, the normal construction is the orderable matrix. Its permutations show the analogy and the complementary nature that exist between the algorithmic treatments and the graphical treatments." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"A statistical table is a systematic arrangement of numerical data in columns and rows. Its purpose is to show quantitative facts clearly, concisely, and effectively. It should facilitate an understanding of the logical relationships among the numbers presented. Tables are used in the compilation of raw data, in the summarizing and analytic processes, and in the presentation of statistics in final form. A good table is the product of careful thinking and hard work. It is not just a package of figures put into neat compartments and ruled to make it look more attractive. It contains carefully selected data put together with thought and ingenuity to serve a specific purpose." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"Tables are [...] the backbone of most statistical reports. They provide the basic substance and foundation on which conclusions can be based. They are considered valuable for the following reasons: (1) Clarity - they present many items of data in an orderly and organized way. (2) Comprehension - they make it possible to compare many figures quickly. (3) Explicitness - they provide actual numbers which document data presented in accompanying text and charts. (4) Economy - they save space, and words. (5) Convenience - they offer easy and rapid access to desired items of information." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"We would wish ‘numerate’ to imply the possession of two attributes. The first of these is an ‘at-homeness’ with numbers and an ability to make use of mathematical skills which enable an individual to cope with the practical mathematical demands of his everyday life. The second is ability to have some appreciation and understanding of information which is presented in mathematical terms, for instance in graphs, charts or tables or by reference to percentage increase or decrease." (Cockcroft Committee, "Mathematics Counts: A Report into the Teaching of Mathematics in Schools", 1982)

"The basic principle which should be observed in designing tables is that of grouping related data, either by the use of space or, if necessary, rules. Items which are close together will be seen as being more closely related than items which are farther apart, and the judicious use of space is therefore vitally important. Similarly, ruled lines can be used to relate and divide information, and it is important to be sure which function is required. Rules should not be used to create closed compartments; this is time-wasting and it interferes with scanning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"We are not saying that the primary purpose of a graph is to convey numbers with as many decimal places as possible. We agree with Ehrenberg (1975) that if this were the only goal, tables would be better. The power of a graph is its ability to enable one to take in the quantitative information, organize it, and see patterns and structure not readily revealed by other means of studying the data." (William Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Models", Journal of the American Statistical Association 79, 1984)

"The ease and speed with which tables can be understood depends very much on the tabulation logic. The author must ask himself what information the reader already has when he consults a particular table, and what information he is seeking from it. The row and column headings should relate to the information he already has, thus leading him to the information he seeks which is displayed in the body of the table." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Wherever possible, numerical tables should be explicit rather than implicit, i.e. the information should be given in full. In an implicit table, the reader may be required to add together two values in order to obtain a third which is not explicitly stated in the table. […] Implicit tables save space, but require more effort on the part of the reader and may cause confusion and errors. They are particularly unsuitable for slides and other transient displays." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"This is why a 'web' of notes with links (like references) between them is far more useful than a fixed hierarchical system. When describing a complex system, many people resort to diagrams with circles and arrows. Circles and arrows leave one free to describe the interrelationships between things in a way that tables, for example, do not. The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything." (Tim Berners-Lee, "Information Management: A Proposal", 1989)

"A good way to evaluate a model is to look at a visual representation of it. After all, what is easier to understand - a table full of mathematical relationships or a graphic displaying a decision tree with all of its splits and branches?" (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Computers are able to multiply useless images without taking into account that, by definition, every graphic corresponds to a table. This table allows you to think about three basic questions that go from the particular to the general level. When this last one receives an answer, you have answers for all of them. Understanding means accessing the general level and discovering significant grouping (patterns). Consequently, the function of a graphic is answering the three following questions:
Which are the X,Y, Z components of the data table? (What it’s all about?)
What are the groups in X, in Y that Z builds? (What the information at the general level is?
What are the exceptions?

These questions can be applied to every kind of problem. They measure the usefulness of whatever construction or graphical invention allowing you to avoid useless graphics." (Jacques Bertin, [interview] 2003)

"Graphs are for the forest and tables are for the trees. Graphs give you the big picture and show you the trends; tables give you the details." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"What distinguishes data tables from graphics is explicit comparison and the data selection that this requires. While a data table obviously also selects information, this selection is less focused than a chart's on a particular comparison. To the extent that some figures in a table are visually emphasised. say in colour or size and style of print. the table is well on its way to becoming a chart. If you're making no comparisons - because you have no particular message and so need no selection (in other words, if you are simply providing a database, number quarry or recycling facility) - tables are easier to use than charts." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008) 

"Plotting data is a useful first stage to any analysis and will show extreme observations together with any discernible patterns. In addition the relative sizes of categories are easier to see in a diagram (bar chart or pie chart) than in a table. Graphs are useful as they can be assimilated quickly, and are particularly helpful when presenting information to an audience. Tables can be useful for displaying information about many variables at once, while graphs can be useful for showing multiple observations on groups or individuals. Although there are no hard and fast rules about when to use a graph and when to use a table, in the context of a report or a paper it is often best to use tables so that the reader can scrutinise the numbers directly." (Jenny Freeman et al, "How to Display Data", 2008)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations: (1) to structure data so the reader can easily pick out the information desired, (2) to display in a chart when the data contains too many variables or values, and (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"The data [in tables] should not be so spaced out that it is difficult to follow or so cramped that it looks trapped. Keep columns close together; do not spread them out more than is necessary. If the columns must be spread out to fit a particular area, such as the width of a page, use a graphic device such as a line or screen to guide the reader’s eye across the row." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"By giving numbers a proper shape, by visually encoding them, the graphic has saved you time and energy that you would otherwise waste if you had to use a table that was not designed to aid your mind." (Alberto Cairo, "The Functional Art", 2011)

"A common mistake is that all visualization must be simple, but this skips a step. You should actually design graphics that lend clarity, and that clarity can make a chart 'simple' to read. However, sometimes a dataset is complex, so the visualization must be complex. The visualization might still work if it provides useful insights that you wouldn’t get from a spreadsheet. […] Sometimes a table is better. Sometimes it’s better to show numbers instead of abstract them with shapes. Sometimes you have a lot of data, and it makes more sense to visualize a simple aggregate than it does to show every data point." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers." (Gary Smith, "Standard Deviations", 2014)

"One thing to keep in mind with a table is that you want the design to fade into the background, letting the data take center stage. Don’t let heavy borders or shading compete for attention. Instead, think of using light borders or simply white space to set apart elements of the table." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"[...] tables interact with our verbal system, graphs interact with our visual system, which is faster at processing information." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Using a table in a live presentation is rarely a good idea. As your audience reads it, you lose their ears and attention to make your point verbally." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A useful way to think about tables and graphics is to visualize layers. Just as photographic files may be manipulated in photo editing software using layers, data presentations are constructed by imagining that layers of an image are placed one on top of another. There are three general layers that apply to visual data presentations: (a) a frame that is typically a rectangle or matrix, (b) axes and coordinate systems (for graphics), and (c) data presented as numbers or geometric objects." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"Reference tables show a lot of data with a high degree of precision. They are designed generally to provide users with a way to find particular pieces of data. […] Summary tables provide some type of extraction of data from a reference table or a spreadsheet. The data are usually manipulated, analyzed, or summarized in some way, such as by sorting or providing summary statistics (means, percentages, ranges). The results of statistical models are usually presented in research reports using this type of table." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"When visuals are applied to data, they can enlighten the audience to insights that they wouldn’t see without charts or graphs. Many interesting patterns and outliers in the data would remain hidden in the rows and columns of data tables without the help of data visualizations. They connect with our visual nature as human beings and impart knowledge that couldn’t be obtained as easily using other approaches that involve just words or numbers." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

01 December 2011

📉Graphical Representation: Percentages (Just the Quotes)

"[…] statistical literacy. That is, the ability to read diagrams and maps; a 'consumer' understanding of common statistical terms, as average, percent, dispersion, correlation, and index number."  (Douglas Scates, "Statistics: The Mathematics for Social Problems", 1943)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"Charts not only tell what was, they tell what is; and a trend from was to is (projected linearly into the will be) contains better percentages than clumsy guessing." (Robert A Levy, "The Relative Strength Concept of Common Stock Forecasting", 1968)

"We would wish ‘numerate’ to imply the possession of two attributes. The first of these is an ‘at-homeness’ with numbers and an ability to make use of mathematical skills which enable an individual to cope with the practical mathematical demands of his everyday life. The second is ability to have some appreciation and understanding of information which is presented in mathematical terms, for instance in graphs, charts or tables or by reference to percentage increase or decrease." (Cockcroft Committee, "Mathematics Counts: A Report into the Teaching of Mathematics in Schools", 1982) 

"The ease with which somewhat complex statistics can produce confusion is important, because we live in a world in which complex numbers are becoming more common. Simple statistical ideas - fractions, percentages, rates - are reasonably well understood by many people. But many social problems involve complex chains of cause and effect that can be understood only through complicated models developed by experts. [...] environment has an influence. Sorting out the interconnected causes of these problems requires relatively complicated statistical ideas - net additions, odds ratios, and the like. If we have an imperfect understanding of these ideas, and if the reporters and other people who relay the statistics to us share our confusion - and they probably do - the chances are good that we'll soon be hearing - and repeating, and perhaps making decisions on the basis of - mutated statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Precision and recall are ways of monitoring the power of the machine learning implementation. Precision is a metric that monitors the percentage of true positives. […] Recall is the ratio of true positives to true positive plus false negatives." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"The most ubiquitous graph is the pie chart. It is a staple of the business world. [...] Never use a pie chart. Present a simple list of percentages, or whatever constitutes the divisions of the pie chart." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Why does representing information in terms of natural frequencies rather than probabilities or percentages foster insight? For two reasons. First, computational simplicity: The representation does part of the computation. And second, evolutionary and developmental primacy: Our minds are adapted to natural frequencies." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Numbers are often useful in stories because they record a recent change in some amount, or because they are being compared with other numbers. Percentages, ratios and proportions are often better than raw numbers in establishing a context." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"The percentage is one of the best (mathematical) friends a journalist can have, because it quickly puts numbers into context. And it's a context that the vast majority of readers and viewers can comprehend immediately." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Generally pie charts are to be avoided, as they can be difficult to interpret particularly when the number of categories is greater than five. Small proportions can be very hard to discern […] In addition, unless the percentages in each of the individual categories are given as numbers it can be much more difficult to estimate them from a pie chart than from a bar chart […]." (Jenny Freeman et al, "How to Display Data", 2008)

"Another way to obscure the truth is to hide it with relative numbers. […] Relative scales are always given as percentages or proportions. An increase or decrease of a given percentage only tells us part of the story, however. We are missing the anchoring of absolute values." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Comparisons are the lifeblood of empirical studies. We can’t determine if a medicine, treatment, policy, or strategy is effective unless we compare it to some alternative. But watch out for superficial comparisons: comparisons of percentage changes in big numbers and small numbers, comparisons of things that have nothing in common except that they increase over time, comparisons of irrelevant data. All of these are like comparing apples to prunes." (Gary Smith, "Standard Deviations", 2014)

"How good the data quality is can be looked at both subjectively and objectively. The subjective component is based on the experience and needs of the stakeholders and can differ by who is being asked to judge it. For example, the data managers may see the data quality as excellent, but consumers may disagree. One way to assess it is to construct a survey for stakeholders and ask them about their perception of the data via a questionnaire. The other component of data quality is objective. Measuring the percentage of missing data elements, the degree of consistency between records, how quickly data can be retrieved on request, and the percentage of incorrect matches on identifiers (same identifier, different social security number, gender, date of birth) are some examples." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"Where there is no natural ordering to the categories it can be helpful to order them by size, as this can help you to pick out any patterns or compare the relative frequencies across groups. As it can be difficult to discern immediately the numbers represented in each of the categories it is good practice to include the number of observations on which the chart is based, together with the percentages in each category." (Jenny Freeman et al, "How to Display Data", 2008)

"Reporting numbers as percentages can obscure important changes in net values. […] Percentage calculations can give strange answers when any of the numbers involved are negative." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what anyone man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician." (Sir Arthur C Doyle)

📉Graphical Representation: Dot Plots/Charts (Just the Quotes)

"Dot charts are suggested as replacements for bar charts. The replacements allow more effective visual decoding of the quantitative information and can be used for a wider variety of data sets." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"[...] error bars are more effectively portrayed on dot charts than on bar charts. […] On the bar chart the upper values of the intervals stand out well, but the lower values are visually deemphasized and are not as well perceived as a result of being embedded in the bars. This deemphasis does not occur on the dot chart." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"Pie charts have severe perceptual problems. Experiments in graphical perception have shown that compared with dot charts, they convey information far less reliably. But if you want to display some data, and perceiving the information is not so important, then a pie chart is fine." (Richard Becker & William S Cleveland," S-Plus Trellis Graphics User's Manual", 1996)

"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The plot tells us the data are granular in the data source, something we could not ascertain with the histogram. There is an important lesson here. Statistics texts and statistical packages that recommend the histogram as the graphical starting point for a data analysis are giving bad advice. The same goes for kernel density estimates. These are appropriate second stages for graphical data analysis. The best starting point for getting a sense of the distribution of a variable is a tally, stem-and-leaf, or a dot plot. A dot plot is a special case of a tally (perhaps best thought of as a delta-neighborhood tally). Once we see that the data are not granular, we may move on to a histogram or kernel density, which smooths the data more than a dot plot." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Area can also make data seem more tangible or relatable, because physical objects take up space. A circle or a square uses more space than a dot on a screen or paper. There’s less abstraction between visual cue and real world." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Visualization is what happens when you make the jump from raw data to bar graphs, line charts, and dot plots. […] In its most basic form, visualization is simply mapping data to geometry and color. It works because your brain is wired to find patterns, and you can switch back and forth between the visual and the numbers it represents. This is the important bit. You must make sure that the essence of the data isn’t lost in that back and forth between visual and the value it represents because if you can’t map back to the data, the visualization is just a bunch of shapes." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Another word of caution for dot plots that show changes over time. The dot plot is, by definition, a summary chart. It does not show all of the data in the intervening years. If the data between the two dots generally move in the same direction, a dot plot is sufficient. But if the data contain sharp variations year by year, a dot plot will obscure that pattern (as it also does for bar charts)." (Jonathan Schwabish, "Better Data Visualizations: A guide for scholars, researchers, and wonks", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.