12 November 2011

📉Graphical Representation: Expert Perspectives (Just the Quotes)

"Absorb the data. Read it, re-read it, read it backwards and understand the lyrical and human-centred contribution." (Kate McLean) [1]

"Admit that nothing you create on deadline will be perfect. However, it should never be wrong. I try to work by a motto my editor likes to say: 'No Heroics. Your code may not be beautiful, but if it works, it’s good enough.' A visualisation may not have every feature you could possibly want, but if it gets the message across and is useful to people, it’s good enough. Being 'good enough' is not an insult in journalism – it’s a necessity." (Lena Groeger) [1]

"After the data exploration phase you may come to the conclusion that the data does not support the goal of the project. The thing is: data is leading in a data visualization project – you cannot make up some data just to comply with your initial ideas. So, you need to have some kind of an open mind and 'listen to what the data has to say' and learn what its potential is for a visualization. Sometimes this means that a project has to stop if there is too much of a mismatch between the goal of the project and the available data. In other cases this may mean that the goal needs to be adjusted and the project can continue." (Jan Willem Tulp) [1]

"Although all our projects are very much data driven, visualisation is only part of the products and solutions we create. This day and age provides us with amazing opportunities to combine video, animation, visualisation, sound and interactivity. Why not make full use of this? Judging whether to include something or not is all about editing: asking 'is it really necessary?'. There is always an aspect of gut feel or instinct mixed with continuous doubt that drives me in these cases." (Thomas Clever) [1]

"At the beginning, there’s a process of 'interviewing' the data – first evaluating their source and means of collection/aggregation/computation, and then trying to get a sense of what they say – and how well they say it via quick sketches in Excel with pivot tables and charts. Do the data, in various slices, say anything interesting? If I’m coming into this with certain assumptions, do the data confirm them, or refute them?" (Alyson Hurt) [1]

"Context is key. You’ll hear that the most important quality of a visualisation is graphical honesty, or storytelling value, or facilitation of 'insights'. The truth is, all of these things (and others) are the most important quality, but in different times and places. There is no singular function of visualisation; what’s important shifts with the constraints of your audience, goals, tools, expertise, and data and time available.’ (Scott Murray) [1]

"Data and data sets are not objective; they are creations of human design. Hidden biases in both the collection and analysis stages present considerable risks [in terms of inference]." (Kate Crawford) [1]

"Data inspires me. I always open the data in its native format and look at the raw data just to get the lay of the land. It’s much like looking at a map to begin a journey." (Kim Rees) [1]

"'Everything must have a reason.' A principle that I learned as a graphic designer that still applies to data visualization. In essence, everything needs to be rationalized and have a logic to why it’s in the design/visualization, or it’s out." (Stefanie Posavec) [1]

"Good design is honest. It does not make a product appear more innovative, powerful or valuable than it really is. It does not attempt to manipulate the consumer with promises that cannot be kept." (Dieter Rams) [1]

"I focus on structural exploration on one side and on the reality and the landscape of opportunities in the other […] I try not to impose any early ideas of what the result will look like because that will emerge from the process. In a nutshell I first activate data curiosity, client curiosity, and then visual imagination in parallel with experimentation." (Santiago Ortiz) [1]

"I kick it over into a rough picture as soon as possible. When I can see something then I am able to ask better questions of it – then the what-about-this iterations begin. I try to look at the same data in as many different dimensions as possible. For example, if I have a spreadsheet of bird sighting locations and times, first I like to see where they happen, previewing it in some mapping software. I’ll also look for patterns in the timing of the phenomenon, usually using a pivot table in a spreadsheet. The real magic happens when a pattern reveals itself only when seen in both dimensions at the same time." (John Nelson) [1]

"I say begin by learning about data visualization’s 'black and whites' , the rules, then start looking for the greys. It really then becomes quite a personal journey of developing your conviction." (Jorge Camoes) [1]

"I suppose one could say our work has a certain signature. Style, to me, has a negative connotation of 'slapped on' = to prettify something without much meaning. We don’t make it our goal to have a recognisable (visual) signature, instead to create work that truly matters and is unique. Pretty much all our projects are bespoke and have a different end result. That is one of the reasons why we are more concerned with working according to values and principles that transcend individual projects and I believe that is what makes our work recognisable." (Thomas Clever) [1]

"I think this is something I’ve learned from experience rather than advice that was passed on. Less can often be more. In other words, don’t get carried away and try to tell the reader everything there is to know on a subject. Know what it is that you want to show the reader and don’t stray from that. I often find myself asking others 'do we need to show this?” or “is this really necessary'?' Let’s take it out." (Simon Scarr) [1]

"I truly feel that experimentation (even for the sake of experimentation) is important, and I would strongly encourage it. There are infinite possibilities in diagramming and visual communication, so we have much to explore yet. I think a good rule of thumb is to never allow your design or implementation to obscure the reader understanding the central point of your piece. However, I’d even be willing to forsake this, at times, to allow for innovation and experimentation. It ends up moving us all forward, in some way or another." (Kennedy Elliott) [1]

"I’m obsessed with alignments. Sloppy label placement on final files causes my confidence in the designer to flag. What other details haven’t been given full attention? Has the data been handled sloppily as well? [...] On the flip side, clean, layered, and logically built final files are a thing of beauty and my confidence in the designer, and their attention to detail, soars." (Jen Christiansen) [1]

"I’ve come to believe that pure beautiful visual works are somehow relevant in everyday life, because they can become a trigger to get people curious to explore the contents these visuals convey. I like the idea of making people say 'oh that’s beautiful! I want to know what this is about!' I think that probably (or, at least, lots of people pointed that out to us) being Italians plays its role on this idea of 'making things not only functional but beautiful'." (Giorgia Lupi) [1]

"It is easy to immerse yourself in a certain idea, but I think it is important to step back regularly and recognize that other people have different ways of interpreting things. I am very fortunate to work with people whom I greatly admire and who also see things from a different perspective. Their feedback is invaluable in the process." (Jane Pong) [1]

"Look at how other designers solve visual problems (but don’t copy the look of their solutions). Look at art to see how great painters use space, and organise the elements of their pictures. Look back at the history of infographics. It’s all been done before, and usually by hand! Draw something with a pencil (or pen [...] but NOT a computer!). Sketch often: The cat asleep. The view from the bus. The bus. Personally, I listen to music – mostly jazz – a lot." (Nigel Holmes) [1]

‘My design approach requires that I immerse myself deeply in the problem domain and available data very early in the project, to get a feel for the unique characteristics of the data, its 'texture' and the affordances it brings. It is very important that the results from these explorations, which I also discuss in detail with my clients, can influence the basic concept and main direction of the project. To put it in Hans Rosling’s words, you need to “let the data set change your mind set”. (Moritz Stefaner) [1]

"My main advice is not to be disheartened. Sometimes the data don’t show what you 
thought they would, or they aren’t available in a usable or comparable form. But [in my world] sometimes that research still turns up threads a reporter could pursue and turn into a really interesting story – there just might not be a viz in it. Or maybe there’s no story at all. And that’s all okay. At minimum, you’ve still hopefully learned something new in the process about a topic, or a data source (person or database), or a 'gotcha' in a particular dataset – lessons that can be applied to another project down the line." (Alyson Hurt) [1]

"Research is key. Data, without interpretation, is just a jumble of words and numbers – out of context and devoid of meaning. If done well, research not only provides a solid foundation upon which to build your graphic/visualisation, but also acts as a source of inspiration and a guidebook for creativity. A good researcher must be a team player with the ability to think critically, analytically, and creatively. They should be a proactive problem solver, identifying potential pitfalls and providing various roadmaps for overcoming them. In short, their inclusion should amplify, not restrain, the talents of others." (Amanda Hobbs) [1]

"The capability to cope with the technological dimension is a key attribute of successful students: coding – more as a logic and a mindset than a technical task – is becoming a very important asset for designers who want to work in Data Visualization. It doesn’t necessarily mean that you need to be able to code to find a job, but it helps a lot in the design process. The profile in the (near) future will be a hybrid one, mixing competences, skills and approaches currently separated into disciplinary silos." (Paolo Ciuccarelli) [1]

"The experience offered by a visualisation influences the interpreting phase of understanding. Whereas tone embodies a continuum, the judgement of the most suitable experience is more distinct and concerns different methods of enabling interpretation: explanatory, exhibitory or exploratory you degrade its existence and malign its importance. Words are not your enemy. Complex thoughts are not your enemy. Confusion is. Don’t confuse your audience. Don’t talk down to them, don’t mislead them, and certainly don’t lie to them." (Amanda Hobbs) [1]

"The key difference I think in producing data visualization/infographics in the service of journalism versus other contexts (like art) is that there is always an underlying, ultimate goal: to be useful. Not just beautiful or efficient – although something can (and should!) be all of those things. But journalism presents a certain set of constraints. A journalist has to always ask the question: How can I make this more useful? How can what I am creating help someone, teach someone, show someone something new?" (Lena Groeger) [1]

"There's a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy." (Amanda Cox, [interview in ( Scott Berinato"The Power of Visualization’s 'Aha!' Moments, Harvard Business Review] 2013) (link) [1]

"Think of the reader – a specific reader, like a friend who’s curious but a novice to the subject and to data-viz – when designing the graphic. That helps. And I rely pretty heavily on that introductory text that runs with each graphic – about 100 words, usually, that should give the new-to-the-subject reader enough background to understand why this graphic is worth engaging with and sets them up to understand and contextualize the takeaway. And annotate the graphic itself. If there’s a particular point you want the reader to understand, make it! Explicitly!" (Katie Peek) [1]

"Using our eyes to switch between different views that are visible simultaneously has much 
lower cognitive load than consulting our mem￾ory to compare a current view with what was seen before." (Tamara Munzner) [1]

"We should pay as much attention to understanding the project’s goal in relation to its audience. This involves understanding principles of perception and cognition in addition to other relevant factors, such as culture and education levels, for example. More importantly, it means carefully matching the tasks in the representation to our audience’s needs, expectations, expertise, etc. Visualizations are human-centred projects, in that they are not universal and will not be effective for all humans uniformly. As producers of visualizations, whether devised for data exploration or communication of information, we need to take into careful consideration those on the other side of the equation, and who will face the challenges of decoding our representations." (Isabel Meirelles) [1]

"What is the least this can be? What is the minimum result that will 1) be factually accurate, 2) present the core concepts of this story in a way that a general audience will understand, and 3) be readable on a variety of screen sizes 
(desktop, mobile, etc.)? And then I judge what else can be done based on the time I have. 
Certainly, when we’re down to the wire it’s no time to introduce complex new features that require lots of testing and could potentially break other, working features." (Alyson Hurt) [1]

"When I first started learning about visualisation, I naively assumed that datasets arrived at your doorstep ready to roll. Begrudgingly I accepted that before you can plot or graph anything, you have to find the data, understand it, evaluate it, clean it, and perhaps restructure it." (Marcia Gray) [1]

"When something is not harmonious, it’s either boring or chaotic. At one extreme is a visual experience that is so bland that the viewer is not engaged. The human brain will reject understimulating information. At the other extreme is a visual experience that is so overdone, so chaotic, that the viewer can’t stand to look at it. The human brain rejects what it cannot organize, what it cannot understand." (Jill Morton) [1]

"When the data has been explored sufficiently, it is time to sit down and reflect – what were the most interesting insights? What surprised me? What were the recurring themes and facts throughout all views on the data? In the end, what do we find most important and most interesting? These are the things that will govern which angles and perspectives we want to emphasize in the subsequent project phases." (Moritz Stefaner) [1]

"You don’t get there [beauty] with cosmetics, you get there by taking care of the details, by polishing and refining what you have. This is ultimately a matter of trained taste, or what German speakers call fingerspitzengefühl ('finger-tip-feeling')." (Oliver Reichenstein) [1]

References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019

💠SQL Server: SQL Server 2012 is almost here [new feature]

I was quite quiet for the past 3-4 months, and this not because of the lack of blogging material, but lack of time. Instead of writing I preferred reading, diving in some special topics related to SQL Server (e.g. tempdb and security), in the near future following to post some of my notes. For short time I was busy learning for ITIL® v3 Foundation Certification, the topics on Knowledge Management giving me more ideas for several posts waiting in the pipe. I started also the online “Introduction to Databases” course offered by Stanford University, attempting thus a scholastic approach of the topic, of importance being the material on Relational Algebra, material I didn’t had the chance to study in the past.

From my perspective, during this time two  important events related to SQL Server took place – the launch of AX Dynamics 2012 and, more recently, the introduction of SQL Server 2012 at PASS (The Professional Association of SQL Server) 2011.

SQL Server 2012

At PASS Summit 2011 were disclosed 4 of the newest SQL Server Products: SQL Server 2012 (code Denali), Power View (code Crescent), ColumnStore Index (code Apollo) and SQL Server Data Tools (code Juneau). The PASS 2011 streamed sessions are available online with quite interesting materials on SQL Server topics like application and database development, database administration and deployment, BI, etc. If you want to learn more about SQL Server, check the CTP 3 Product Guide, which contains datasheets, white papers, technical presentations, demonstrations and links to videos, or the SQL Server 2012 Developer Training Kit Preview (requires Microsoft’s Web Platform Installer).

Dynamics AX 2012

Because lately I’ve been spending more and more time with Dynamics AX, Microsoft’s ERP (Enterprise Resource Planning) solution, I’d like to include related content in my posts, at least presenting resources if I can’t get yet into technical stuff. As its backend is based mainly on SQL Server, AX is the perfect environment to see SQL Server at work, or to perform configuration and administration activities. In addition, AX material (best/good practices, methodologies, various other papers) related to SQL Server could be extended to other environments. I’m saluting Microsoft’s decision of making available publicly more Technet and MSDN content, previously most of the technical content being accessible mainly though Microsoft’s Partner Network and Customer Network. A good compilation of resources is available on AX Technical Support Blog and Inside Microsoft Dynamics AX blog.

As pointed above, recently was launched Microsoft Dynamics AX 2012 (see global and local launch events).  It’s interesting to point out that, with this edition, SSRS becomes the reporting platform for AX, a considerable step forward.

Books

In what concerns the free books there are 3 free “new” appearances: Jonathan Kehayias and Ted Krueger’s book Troubleshooting SQL Server: A Guide for the Accidental DBA (zipped PDF), which provides a basic approach to troubleshooting, Fabiano Amorim’s book on Complete Showplan Operators (PDF, Epub), and Ross Mistry and Stacia Misner’s Introducing Microsoft SQL Server 2008 R2 (PDF, requires registration).

11 November 2011

📉Graphical Representation: Matrices (Just the Quotes)

"The problem that still remains to be solved is that of the orderable matrix, that needs the use of imagination […] When the two components of a data table are orderable, the normal construction is the orderable matrix. Its permutations show the analogy and the complementary nature that exist between the algorithmic treatments and the graphical treatments." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"The square has always had a no-nonsense sort of image. Stable, solid, and - well - square. Perhaps that's why it is the shape used in business visuals in those rare cases where a visual is even bothered with. Flip through most business books and you'll find precious few places for your eye to stop and your visual brain to engage. But when you do, the shape of the graphic, chart, matrix, table, or diagram is certainly square. It's a comfortable shape, which makes it a valuable implement in your kit of visual communication tools." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"Characterizing a two-dimensional scatterplot is relatively easy, particularly with the full range of recently developed graphical enhancements at hand. However, standard patterns to watch for in three-dimensional plots are not as well understood as they are in many two-dimensional plots. We can certainly look for very general characteristics like curvature in three-dimensional plots, but it may not be clear how or if the curvature itself should be characterized. It is also possible to obtain useful insights into higher-dimensional scatterplots, but for the most part their interpretation must rely on lower-dimensional constructions. Similar statements apply to scatterplot matrices and various linked plots." (R Dennis Cook, "Regression Graphics: Ideas for Studying Regressions through Graphics", 1998)

"The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot, shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary." (Michael Friendly, "Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data", 199)

"Two types of graphic organizers are commonly used for comparison: the Venn diagram and the comparison matrix [...] the Venn diagram provides students with a visual display of the similarities and differences between two items. The similarities between elements are listed in the intersection between the two circles. The differences are listed in the parts of each circle that do not intersect. Ideally, a new Venn diagram should be completed for each characteristic so that students can easily see how similar and different the elements are for each characteristic used in the comparison." (Robert J. Marzano et al, "Classroom Instruction that Works: Research-based strategies for increasing student achievement, 2001)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"One big advantage of parallel coordinate plots over scatterplot matrices. (i.e., the matrix of scatterplots of all variable pairs) is that parallel coordinate plots need less space to plot the same amount of data. On the other hand, parallel coordinate plots with p variables show only p - 1 adjacencies. However, adjacent variables reveal most of the information in a parallel coordinate plot. Reordering variables in a parallel coordinate plot is therefore essential." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009) 

"Whereas charts generally focus on a trend or comparison, tables organize data for the reader to scan. Tables present data in an easy-read-format, or matrix. Tables arrange data in columns or rows so readers can make side-by-side comparisons. Tables work for many situations because they convey large amounts of data and have several variables for each item. Tables allow the reader to focus quickly on a specific item by scanning the matrix or to compare multiple items by scanning the rows or columns."  (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"With further similarities to small multiples, heatmaps enable us to perform rapid pattern matching to detect the order and hierarchy of different quantitative values across a matrix of categorical combinations. The use of a color scheme with decreasing saturation or increasing lightness helps create the sense of data magnitude ranking." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"One problem for visualizing multiple views is that of laying out the plots. Indeed, there are some plots, such as scatterplot matrixes and trellis displays, that are formed just by arranging simpler plots according to certain rules. Scatterplot matrices, for example, arrange scatterplots side by side so that each variable in a dataset is graphed against the other variables, with the graphs being displayed as a row or a column of the matrix. This lets the user rapidly inspect all of the bivariate relationships among the variables, permitting the detection of outliers, nonlinearities, and other features of the data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A useful way to think about tables and graphics is to visualize layers. Just as photographic files may be manipulated in photo editing software using layers, data presentations are constructed by imagining that layers of an image are placed one on top of another. There are three general layers that apply to visual data presentations: (a) a frame that is typically a rectangle or matrix, (b) axes and coordinate systems (for graphics), and (c) data presented as numbers or geometric objects." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"A heatmap is a visualization where values contained in a matrix are represented as colors or color saturation. Heatmaps are great for visualizing multivariate data" (data in which analysis is based on more than two variables per observation), where categorical variables are placed in the rows and columns and a numerical or categorical variable is represented as colors or color saturation." (Mario Döbler & Tim Großmann, "The Data Visualization Workshop", 2nd Ed., 2020)

📉Graphical Representation: Structure (Just the Quotes)

"Graphic charts have often been thought to be tools of those alone who are highly skilled in mathematics, but one needs to have a knowledge of only eighth-grade arithmetic to use intelligently even the logarithmic or ratio chart, which is considered so difficult by those unfamiliar with it. […] If graphic methods are to be most effective, those who are unfamiliar with charts must give some attention to their fundamental structure. Even simple charts may be misinterpreted unless they are thoroughly understood. For instance, one is not likely to read an arithmetic chart correctly unless he also appreciates the significance of a logarithmic chart." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"Structured information is any type of information that is arranged to show relationships between the minute, individual particles" (bits) of information and the final presentation of this information in a logical arrangement with continuity from beginning to end." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Frequently we can increase the informativeness of a graph by removing structure from the data once we have identified it, so that subsequent plots are free of its dominating influence and can help us see finer structure or subtler effects. This usually means" (l) partitioning the data, or" (2) plotting differences or ratios, or" (3) fitting a model and taking the residuals as a new set of data for further study." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The truth is that one display is better than another if it leads to more understanding. Often a simpler display, one that tries to accomplish less at one time, succeeds in conveying more insight. In order to understand complicated or subtle structure in the data we should be prepared to look at complicated displays when necessary, but to see any particular type of structure we should use the simplest display that shows it." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"In order to be easily understood, a display of information must have a logical structure which is appropriate for the user's knowledge and needs, and this structure must be clearly represented visually. In order to indicate structure, it is necessary to be able to emphasize, divide and relate items of information. Visual emphasis can be used to indicate a hierarchical relationship between items of information, as in the case of systems of headings and subheadings for example. Visual separation of items can be used to indicate that they are different in kind or are unrelated functionally, and similarly a visual relationship between items will imply that they are of a similar kind or bear some functional relation to one another. This kind of visual 'coding' helps the reader to appreciate the extent and nature of the relationship between items of information, and to adopt an appropriate scanning strategy." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Maximizing data ink (within reason) is but a single dimension of a complex and multivariate design task. The principle helps conduct experiments in graphical design. Some of those experiments will succeed. There remain, however, many other considerations in the design of statistical graphics - not only of efficiency, but also of complexity, structure, density, and even beauty." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"A good graph displays relationships and structures that are difficult to detect by merely looking at the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The content and context of the numerical data determines the most appropriate mode of presentation. A few numbers can be listed, many numbers require a table. Relationships among numbers can be displayed by statistics. However, statistics, of necessity, are summary quantities so they cannot fully display the relationships, so a graph can be used to demonstrate them visually. The attractiveness of the form of the presentation is determined by word layout, data structure, and design." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Merely drawing a plot does not constitute visualization. Visualization is about conveying important information to the reader accurately. It should reveal information that is in the data and should not impose structure on the data." (Robert Gentleman, "Bioinformatics and Computational Biology Solutions using R and Bioconductor", 2005)

"A diagram is a graphic shorthand. Though it is an ideogram, it is not necessarily an abstraction. It is a representation of something in that it is not the thing itself. In this sense, it cannot help but be embodied. It can never be free of value or meaning, even when it attempts to express relationships of formation and their processes. At the same time, a diagram is neither a structure nor an abstraction of structure." (Peter Eisenman, "Written Into the Void: Selected Writings", 1990-2004, 2007)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form" (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008) 

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations:" (1) to structure data so the reader can easily pick out the information desired," (2) to display in a chart when the data contains too many variables or values, and" (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations:" (1) to structure data so the reader can easily pick out the information desired," (2) to display in a chart when the data contains too many variables or values, and" (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

"Data storytelling can be defined as a structured approach for communicating data insights using narrative elements and explanatory visuals." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Beyond the design of individual charts, the sequence of data visualizations creates grammar within the exposition. Cohesive visualizations follow common narrative structures to fully express their message. Order matters. " (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Continuity (Just the Quotes)

"In certain respects, line graphs are uniquely applicable to particular graphic requirements for which a bar or circle chart could not be substituted. Strictly speaking, the line graph must be used to portray changes in a continuous variable, since technically such a variable must be represented by a line and not by 'points' or 'bars'. Line graphs are often uniquely applicable to problems of analysis, particularly when it is essential to visualize a trend, observe the behavior of a set of variables through time, or portray the same variable in differing time periods." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Although in most cases the actual value designated by a bar is determined by the location of the end of the bar, many people associate the length or area of the bar with its value. As long as the scale is linear, starts at zero, is continuous, and the bars are the same width, this presents no problem. When any of these conditions are changed, the potential exists that the graph will be misinterpreted." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Use of a histogram should be strictly reserved for continuous numerical data or for data that can be effectively modelled as continuous […]. Unlike bar charts, therefore, the bars of a histogram corresponding to adjacent intervals should not have gaps between them, for obvious reasons." (Alan Graham, "Developing Thinking in Statistics", 2006)

"When it comes to drawing a picture of continuous data, you need to think through carefully where one interval ends and the next one begins. Failing to do this can result in overlaps or gaps between adjacent intervals, which can cause confusion." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Like a black hole or any similar rent in the warp and woof of space-time, a singularity is a disruption of continuity, a break with the past. It is a point at which everything changes, and a point beyond which we can’t see." (Scott Rosenberg, "Dreaming in Code", 2007)

"The first requirement of a beautiful visualization is that it is novel, fresh, or unique. It is difficult (though not impossible) to achieve the necessary novelty using default formats. In most situations, well-defined formats have well-defined, rational conventions of use: line graphs for continuous data, bar graphs for discrete data, pie graphs for when you are more interested in a pretty picture than conveying knowledge." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"Scatterplots are still the go-to visualization when one is examining relationships between continuous variables. One of the problems with the traditional scatterplot is that all data points are presented as if they are on equal footing. [...] Bubble maps are scatterplots with added dimensions. The most common usage is to add weight to individual data points based on population." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013)

"Broadly defined, data means events that are captured and made available for analysis. A data source is a consistent record of these events. And a data product translates this record of events into something that can easily be understood. [...] Data products can be organized and characterized by a series of continuums that describe the nature of the data and how it is presented." (Zach Gemignani et al, "Data Fluency", 2014)

"Complementary colors send a message of opposition but also of balance. A chart with saturated complementary colors is an aggressively colored chart in which the colors fight (equally) for their share of attention. Apply this rule when you intend to represent very distinct variables or those that for some reason you want to show as contrasting each other. Do not use complementary colors when variables have some form of continuity or order." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The law of continuity states that we interpret images so as not to generate abrupt transitions or otherwise create images that are more complex. […] we can arbitrarily fill in the missing elements to complete a pattern. It’s also the case of time series, in which we assume that data points in the future will be a smooth continuation of the past. […] In a line chart, those series with a similar slope (that is, they appear to follow the same direction) are understood as belonging to the same group." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution" (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

10 November 2011

📉Graphical Representation: Ink (Just the Quotes)

"Co-ordinate ruling does not appear prominently on most original charts because the ruling is usually printed in some color of ink distinct from the curve itself. When, however, a chart is reproduced in a line engraving the co-ordinate lines come out the same color as the curve or other important data, and there may be too little contrast to assist the reader." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Correct emphasis is basic to effective graphic presentation. Intensity of color is the simplest method of obtaining emphasis. For most reproduction purposes black ink on a white page is most generally used. Screens, dots and lines can, of course, be effectively used to give a gradation of tone from light grey to solid black. When original charts are the subjects of display presentation, use of colors is limited only by the subject and the emphasis desired." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics, and of design. Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency. Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Graphical excellence is nearly always multivariate. And graphical excellence requires telling the truth about the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies - to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk. " (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"A convenient measure of the extent to which this practice is in use is Tufte's 'data-ink ratio'. This measure is the ratio of the amount of ink used in graphing the data to the total amount of ink in the graph. The closer to zero this ratio gets, the worse the graph. The notion of the data-ink ratio brings us to the second principle of bad data display." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984)

"Graphics are almost always going to improve as they go through editing, revision, and testing against different design options. The principles of maximizing data-ink and erasing generate graphical alternatives and also suggest a direction in which revisions should move." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"Maximizing data ink (within reason) is but a single dimension of a complex and multivariate design task. The principle helps conduct experiments in graphical design. Some of those experiments will succeed. There remain, however, many other considerations in the design of statistical graphics - not only of efficiency, but also of complexity, structure, density, and even beauty." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"This pie chart violates several of the rules suggested by the question posed in the introduction. First, immediacy: the reader has to turn to the legend to find out what the areas represent; and the lack of color makes it very difficult to determine which area belongs to what code. Second, the underlying structure of the data is completely ignored. Third, a tremendous amount of ink is used to display eight simple numbers." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics" (and How to Avoid Them)", 2003)

"Aligning on data ink can be a powerful way to build relationships across charts. It can be used to obscure the lines between charts, making the composition feel more seamless. [....] Alignment paradigms can also influence the layout design needed. [...] The layout added to the alignment further supports this relationship." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Index Numbers (Just the Quotes)

"To a very striking degree our culture has become a Statistical culture. Even a person who may never have heard of an index number is affected [...] by [...] of those index numbers which describe the cost of living. It is impossible to understand Psychology, Sociology, Economics, Finance or a Physical Science without some general idea of the meaning of an average, of variation, of concomitance, of sampling, of how to interpret charts and tables." (Carrol D Wright, 1887)

"In any chart where index numbers are used the greatest care should be taken to select as unity a set of conditions thoroughly typical and representative. It is frequently best to take as unity the average of a series of years immediately preceding the years for which a study is to be made. The series of years averaged to represent unity should, if possible, be so selected that they will include one full cycle or wave of fluctuation. If one complete cycle involves too many years, the years selected as unity should be taken in equal number on either side of a year which represents most nearly the normal condition." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"The use of two or more amount scales for comparisons of series in which the units are unlike and, therefore, not comparable [...] generally results in an ineffective and confusing presentation which is difficult to understand and to interpret. Comparisons of this nature can be much more clearly shown by reducing the components to a comparable basis as percentages or index numbers." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"The economists, of course, have great fun - and show remarkable skill - in inventing more refined index numbers. Sometimes they use geometric averages instead of arithmetic averages" (the advantage here being that the geometric average is less upset by extreme oscillations in individual items), sometimes they use the harmonic average. But these are all refinements of the basic idea of the index number [...]" (Michael J Moroney, "Facts from Figures", 1951)

"Index numbers are today one of the most widely used statistical devices…They are used to take the pulse of the economy and they have come to be used as indicators of inflationary or deflationary tendencies." (George Simpson & Fritz Kafka, "Basic Statistics", 1952)

"The great trouble with all business data upon which the statisticians and economists base their forecasts is that they are ancient history before they ever become available. They pertain to conditions which existed some weeks or months previous. The figures for what is going on at the moment in all lines of business are never available. A business index, while of great interest and value, is always historical and never predictive." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"The fact that index numbers attempt to measure changes of items gives rise to some knotty problems. The dispersion of a group of products increases with the passage of time, principally because some items have a long-run tendency to fall while others tend to rise. Basic changes in the demand is fundamentally responsible. The averages become less and less representative as the distance from the period increases." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"A statistical index has all the potential pitfalls of any descriptive statistic - plus the distortions introduced by combining multiple indicators into a single number. By definition, any index is going to be sensitive to how it is constructed; it will be affected both by what measures go into the index and by how each of those measures is weighted." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"When using indexes in a data set, using an average aggregation is appropriate as long as you only use it at the individual region, month, and visitor type level (i.e., the lowest granularity of the data). You cannot use an average of the average to represent the total."  (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

09 November 2011

📉Graphical Representation: Inquiry (Just the Quotes)

"Statistics are numerical statements of facts in any department of inquiry, placed in relation to each other; statistical methods are devices for abbreviating and classifying the statements and making clear the relations." (Arthur L Bowley, "An Elementary Manual of Statistics", 1934)

"One of the greatest values of the graphic chart is its use in the analysis of a problem. Ordinarily, the chart brings up many questions which require careful consideration and further research before a satisfactory conclusion can be reached. A properly drawn chart gives a cross-section picture of the situation. While charts may bring out. hidden facts in tables or masses of data, they cannot take the place of careful, analysis. In fact, charts may be dangerous devices when in the hands of those unwilling to base their interpretations upon careful study. This, however, does not detract from their value when they are properly used as aids in solving statistical problems." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"The histogram, with its columns of area proportional to number, like the bar graph, is one of the most classical of statistical graphs. Its combination with a fitted bell-shaped curve has been common since the days when the Gaussian curve entered statistics. Yet as a graphical technique it really performs quite poorly. Who is there among us who can look at a histogram-fitted Gaussian combination and tell us, reliably, whether the fit is excellent, neutral, or poor? Who can tell us, when the fit is poor, of what the poorness consists? Yet these are just the sort of questions that a good graphical technique should answer at least approximately." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"There are as many types of questions as components in the information." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution." (Edward R Tufte, "Envisioning Information", 1990)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis [...] begins with a dataset in hand. Our purpose in data analysis is to learn what we can from those data, to help us draw conclusions about our broader research questions. Our research questions determine what sort of data we need in the first place, and how we ought to go about collecting them. Unless data collection has been done carefully, even a brilliant analyst may be unable to reach valid conclusions regarding the original research questions." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"The application of the same techniques and scales of enquiry to each and every district meant that the resultant maps and statistical tables all contained the same sorts of information and were constructed and tabulated in the same manner. They therefore obscured or denied local nuances and particular circumstances." (Matthew H Edney, "Mapping an Empire: The Geographical Construction of British India, 1765–1843", 1999)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The final step in creating your graphic is to refine it. Step back and look at it with fresh eyes. Is there anything that could be removed? Or anything that should be removed because it is distracting? Consider each element in your figure and question whether it contributes enough to your overall goal to justify its contribution. Also consider whether there is anything that could be represented more clearly. Perhaps you have been so effective at simplifying your graphic that you could now include another point in the same figure. Another method of refinement is to check the placement and alignment of your labels. They should be unobtrusive and clearly indicate which object they refer to. Consistency in fonts and alignment of labels can make the difference between something that is easy and pleasant to read, and something that is cluttered and frustrating." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Early exploration of a dataset can be overwhelming, because you don’t know where to start. Ask questions about the data and let your curiosities guide you. […] Make multiple charts, compare all your variables, and see if there are interesting bits that are worth a closer look. Look at your data as a whole and then zoom in on categories and individual data points. […] Subcategories, the categories within categories (within categories), are often more revealing than the main categories. As you drill down, there can be higher variability and more interesting things to see." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Visual Organizations benefit from routinely visualizing many different types and sources of data. Doing so allows them to garner a better understanding of what’s happening and why. Equipped with this knowledge, employees are able to ask better questions and make better business decisions." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"What would a successful outcome look like? If you only had a limited amount of time or a single sentence to tell your audience what they need to know, what would you say? In particular, I find that these last two questions can lead to insightful conversation." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"The most pragmatic way of beginning the data visualization process is with a question, and then making a chart that answers that question. […] Certain charts are better suited to answer certain questions than others, but you should take this relationship as a broad principle. Subtle changes in the question and in the chart design can impact the results. Having a clear goal in mind and knowing what type of visualization could be more effective can help us reduce the range of options of chart types and design choices." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"A data story starts out like any other story, with a beginning and a middle. However, the end should never be a fixed event, but rather a set of options or questions to trigger an action from the audience. Never forget that the goal of data storytelling is to encourage and energize critical thinking for business decisions." (James Richardson, 2017)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"What is the purpose of collecting data? People gather and store data for at least three different reasons that I can discern. One reason is that they want to build an arsenal of evidence with which to prove a point or defend an agenda that they already had to begin with. This path is problematic for obvious reasons, and yet we all find ourselves traveling on it from time to time. Another reason people collect data is that they want to feed it into an artificial intelligence algorithm to automate some process or carry out some task. […] A third reason is that they might be collecting data in order to compile information to help them better understand their situation, to answer questions they have in their mind, and to unearth new questions that they didn't think to ask." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"A good visualization can do more than just answer questions; it can help you see that there are other questions you need to answer." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"For a chart to be truly insightful, context is crucial because it provides us with the visual answer to an important question - 'compared with what'? No number on its own is inherently big or small – we need context to make that judgement. Common contextual comparisons in charts are provided by time" ('compared with last year...') and place" ('compared with the north...'). With ranking, context is provided by relative performance" ('compared with our rivals...')." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

📉Graphical Representation: Completeness (Just the Quotes)

"The title for any chart presenting data in the graphic form should be so clear and so complete that the chart and its title could be removed from the context and yet give all the information necessary for a complete interpretation of the data. Charts which present new or especially interesting facts are very frequently copied by many magazines. A chart with its title should be considered a unit, so that anyone wishing to make an abstract of the article in which the chart appears could safely transfer the chart and its title for use elsewhere." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"A man's judgment cannot be better than the information on which he has based it. Give him no news, or present him only with distorted and incomplete data, with ignorant, sloppy, or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning process and make him somewhat less than a man." (Arthur H Sulzberger, [speech] 1948)

"Graphical methodology provides powerful diagnostic tools for conveying properties of the fitted regression, for assessing the adequacy of the fit, and for suggesting improvements. There is seldom any prior guarantee that a hypothesized regression model will provide a good description of the mechanism that generated the data. Standard regression models carry with them many specific assumptions about the relationship between the response and explanatory variables and about the variation in the response that is not accounted for by the explanatory variables. In many applications of regression there is a substantial amount of prior knowledge that makes the assumptions plausible; in many other applications the assumptions are made as a starting point simply to get the analysis off the ground. But whatever the amount of prior knowledge, fitting regression equations is not complete until the assumptions have been examined." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Labels should be complete but succinct. Long and complicated labels will defeat the viewer and therefore the purpose of the graph. Treat a label as a cue to jog the memory or to complete comprehension. Shorten long labels; avoid abbreviations unless they are universally understood; avoid repetition on the same graph. A title, for instance, should not repeat what is already in the axis labels. Be consistent in terminology." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"[…] a graphic with loose, incomplete information that is too verbose, vague or passive can actually impede your audience’s ability to make sense of the information at hand. If the graphic confuses or frustrates the audience, you’re likely to do more harm than good, leave them with more questions than answers and essentially turn them away from your publication." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Although as a graphics reporter you may not find yourself in ethical dilemmas as regularly as other journalists, there are some common scenarios that pop up from time to time. The first of these is the tendency to be faced with incomplete data and the temptation to “fill in the blanks” in order to complete your graphic. When information is incomplete or seems to be misleading, you must make every effort to find the missing links through more research and fact-finding. Often, you can consult the original source(s) of the data and, by asking a few more questions, fill in the missing pieces of the puzzle. If this doesn’t work, there are often ways to present the information you do have in a way that provides the reader with a bit more detail, while at the same time, makes it clear that there, in fact, are some missing numbers." (Jennifer George-Palilonis,"A Practical Guide to Graphics Reporting", 2006)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Information graphics are an essential component of technical communication. Very few technical documents or presentations can be considered complete without graphical elements to present some essential data. Because engineers are visually oriented, graphic aids allow their thoughts and ideas to be better understood by other engineers. Information graphics are essential in presenting data because they simplify the content, offer a visually pleasing alternative to gray text in a proposal or an article, and thereby invite interest." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"To become a great data analyst, you must be able to identify and deal with incomplete data and work to identify the data quality and accuracy issues in a data set." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Data storytelling is a method of communicating information that is custom-fit for a specific audience and offers a compelling narrative to prove a point, highlight a trend, make a sale, or all of the above. [...] Data storytelling combines three critical components, storytelling, data science, and visualizations, to create not just a colorful chart or graph, but a work of art that carries forth a narrative complete with a beginning, middle, and end." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

📉Graphical Representation: Knowledge (Just the Quotes)

"Numerical facts, like other facts, are but the raw materials of knowledge, upon which our reasoning faculties must be exerted in order to draw forth the principles of nature. [...] Numerical precision is the soul of science [...]" (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"[…] it must be noticed that these diagrams do not naturally harmonize with the propositions of ordinary life or ordinary logic. […] The great bulk of the propositions which we commonly meet with are founded, and rightly founded, on an imperfect knowledge of the actual mutual relations of the implied classes to one another. […] one very marked characteristic about these circular diagrams is that they forbid the natural expression of such uncertainty, and are therefore only directly applicable to a very small number of such propositions as we commonly meet with." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"In working through graphics one has, however, to be exceedingly cautious in certain particulars, for instance, when a set of figures, dynamical or financial, are available they are, so long as they are tabulated, instinctively taken merely at their face value. When plotted, however, there is a temptation to extrapolation which is well nigh irresistible to the untrained mind. Sometimes the process can be safely employed, but it requires a rather comprehensive knowledge of the facts that lie back of the data to tell when to go ahead and when to stop." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Graphic charts have often been thought to be tools of those alone who are highly skilled in mathematics, but one needs to have a knowledge of only eighth-grade arithmetic to use intelligently even the logarithmic or ratio chart, which is considered so difficult by those unfamiliar with it. […] If graphic methods are to be most effective, those who are unfamiliar with charts must give some attention to their fundamental structure. Even simple charts may be misinterpreted unless they are thoroughly understood. For instance, one is not likely to read an arithmetic chart correctly unless he also appreciates the significance of a logarithmic chart." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"The effective communication of information in visual form, whether it be text, tables, graphs, charts or diagrams, requires an understanding of those factors which determine the 'legibility', 'readability' and 'comprehensibility', of the information being presented. By legibility we mean: can the data be clearly seen and easily read? By readability we mean: is the information set out in a logical way so that its structure is clear and it can be easily scanned? By comprehensibility we mean: does the data make sense to the audience for whom it is intended? Is the presentation appropriate for their previous knowledge, their present information needs and their information processing capacities?" (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"The prevailing style of management must undergo transformation. A system cannot understand itself. The transformation requires a view from outside. The aim [...] is to provide an outside view - a lens - that I call a system of profound knowledge. It provides a map of theory by which to understand the organizations that we work in." (W Edwards Deming, "The New Economics for Industry, Government, Education", 1994)

"The representational nature of maps, however, is often ignored - what we see when looking at a map is not the word, but an abstract representation that we find convenient to use in place of the world. When we build these abstract representations we are not revealing knowledge as much as are creating it." (Alan MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Knowledge workers and BI experts must continually evaluate the reports, dashboards, alerts, and other mechanisms for disseminating factual information to ensure the design facilitates insight." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"It is the responsibility of the ‘transformer’ to understand the data, to get all necessary information from the expert, to decide what is worth transmitting to the public, how to make it understandable, how to link it with general knowledge or with information already given in other charts. In this sense, the transformer is the trustee of the public." (Marie Neurath & Robin Kinross, "The transformer: principles of making Isotype charts", 2009)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The calculus of causation consists of two languages: causal diagrams, to express what we know, and a symbolic language, resembling algebra, to express what we want to know. The causal diagrams are simply dot-and-arrow pictures that summarize our existing scientific knowledge. The dots represent quantities of interest, called 'variables', and the arrows represent known or suspected causal relationships between those variables - namely, which variable 'listens' to which others." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The second rule of communication is to know what you want to achieve. Hopefully the aim is to encourage open debate, and informed decision-making. But there seems no harm in repeating yet again that numbers do not speak for themselves; the context, language and graphic design all contribute to the way the communication is received. We have to acknowledge we are telling a story, and it is inevitable that people will make comparisons and judgements, no matter how much we only want to inform and not persuade. All we can do is try to pre-empt inappropriate gut reactions by design or warning." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"When visuals are applied to data, they can enlighten the audience to insights that they wouldn’t see without charts or graphs. Many interesting patterns and outliers in the data would remain hidden in the rows and columns of data tables without the help of data visualizations. They connect with our visual nature as human beings and impart knowledge that couldn’t be obtained as easily using other approaches that involve just words or numbers." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

See also the quotes on "Knowledge" in Data Science, Knowledge Managemment, Strategic Management 

📉Graphical Representation: Large Values (Just the Quotes)

"Huge numbers are commonplace in our culture, but oddly enough the larger the number the less meaningful it seems to be." (Albert Sukoff, "Lotsa Hamburgers", Saturday Review of the Society, 1973)

"We know the laws of trial and error, of large numbers and probabilities. We know that these laws are part of the mathematical and mechanical fabric of the universe, and that they are also at play in biological processes. But, in the name of the experimental method and out of our poor knowledge, are we really entitled to claim that everything happens by chance, to the exclusion of all other possibilities?" (Albert Claude, [Nobel Prize Lecture], 1974)

"A graph presents a limited number of figures in a bold and forceful manner. To do this it usually must omit a large number of figures available on the subject. The choice of what graphic format to use is largely a matter of deciding what figures have the greatest significance to the intended reader and what figures he can best afford to skip." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"The logarithm is an extremely powerful and useful tool for graphical data presentation. One reason is that logarithms turn ratios into differences, and for many sets of data, it is natural to think in terms of ratios. […] Another reason for the power of logarithms is resolution. Data that are amounts or counts are often very skewed to the right; on graphs of such data, there are a few large values that take up most of the scale and the majority of the points are squashed into a small region of the scale with no resolution." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The trouble with integers is that we have examined only the small ones. Maybe all the exciting stuff happens at really big numbers, ones we can’t get our hand on or even begin to think about in any very definite way. So maybe all the action is really inaccessible and we’re just fiddling around. Our brains have evolved to get us out of the rain, find where the berries are, and keep us from getting killed. Our brains did not evolve to help us grasp really large numbers or to look at things in a hundred thousand dimensions." (Paul Hauffman, "The Man Who Loves Only Numbers", The Atlantic Magazine, Vol 260, No 5, 1987)

"The law of truly large numbers states: With a large enough sample, any outrageous thing is likely to happen." (Frederick Mosteller, "Methods for Studying Coincidences Journal of the American Statistical Association, Volume 84, 1989)

"A good description of the data summarizes the systematic variation and leaves residuals that look structureless. That is, the residuals exhibit no patterns and have no exceptionally large values, or outliers. Any structure present in the residuals indicates an inadequate fit. Looking at the residuals laid out in an overlay helps to spot patterns and outliers and to associate them with their source in the data." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005) 

"Comparisons are the lifeblood of empirical studies. We can’t determine if a medicine, treatment, policy, or strategy is effective unless we compare it to some alternative. But watch out for superficial comparisons: comparisons of percentage changes in big numbers and small numbers, comparisons of things that have nothing in common except that they increase over time, comparisons of irrelevant data. All of these are like comparing apples to prunes." (Gary Smith, "Standard Deviations", 2014)

📉Graphical Representation: Failure (Just the Quotes)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Where the values of a series are such that a large part the grid would be superfluous, it is the practice to break the grid thus eliminating the unused portion of the scale, but at the same time indicating the zero line. Failure to include zero in the vertical scale is a very common omission which distorts the data and gives an erroneous visual impression." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"[…] the partial scale break is a weak indicator that the reader can fail to appreciate fully; visually the graph is still a single panel that invites the viewer to see, inappropriately, patterns between the two scales. […] The partial scale break also invites authors to connect points across the break, a poor practice indeed; […]" (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38" (4) 1984)

"When a graph is constructed, quantitative and categorical information is encoded, chiefly through position, size, symbols, and color. When a person looks at a graph, the information is visually decoded by the person's visual system. A graphical method is successful only if the decoding process is effective. No matter how clever and how technologically impressive the encoding, it is a failure if the decoding process is a failure. Informed decisions about how to encode data can be achieved only through an understanding of the visual decoding process, which is called graphical perception." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information." (Edward R Tufte, "Envisioning Information", 1990)

"Audience boredom is usually a content failure, not a decoration failure." (Edward R Tufte, "The cognitive style of PowerPoint", 2003)

"Diagrams are a means of communication and explanation, and they facilitate brainstorming. They serve these ends best if they are minimal. Comprehensive diagrams of the entire object model fail to communicate or explain; they overwhelm the reader with detail and they lack meaning." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"The Sixth Principle for the analysis and display of data: 'Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.' This suggests that the most effective way to improve a presentation is to get better content. It also suggests that design devices and gimmicks cannot salvage failed content." (Edward R Tufte, "Beautiful Evidence", 2006)

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008)

"Designing good visual displays with an easy-to-use interactive system is difficult. The designer’s first attempts will usually fail, so it is critical that proposed systems be tested on at least several sets of typical users. These usability tests help the designer iterate to the best possible system." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"To be sure, data doesn’t always need to be visualized, and many data visualizations just plain suck. Look around you. It’s not hard to find truly awful representations of information. Some work in concept but fail because they are too busy; they confuse people more than they convey information [...]. Visualization for the sake of visualization is unlikely to produce desired results - and this goes double in an era of Big Data. Bad is still bad, even and especially at a larger scale." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"The goal of using data visualization to make better and faster decisions may lead people to think that any data visualization that is not immediately understood is a failure. Yes, a good visualization should allow you to see things that you might have missed, and to glean insights faster, but you still have to think." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The rise of graphicacy and broader data literacy intersects with the technology that makes it possible and the critical need to understand information in ways current literacies fail. Like reading and writing, data literacy must become mainstream to fully democratize information access." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"A perfectly relevant visualization that breaks a few presentation rules is far more valuable - it’s better - than a perfectly executed, beautiful chart that contains the wrong data, communicates the wrong message, or fails to engage its audience. [...] The more relevant a data visualization is to its context, the more forgiving, to a point, we can be about its execution" (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.