08 December 2011

📉Graphical Representation: Standards (Just the Quotes)

"Graphic representation by means of charts depends upon the super-position of special lines or curves upon base lines drawn or ruled in a standard manner. For the economic construction of these charts as well as their correct use it is necessary that the standard rulings be correctly designed." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Most authors would greatly resent it if they were told that their writings contained great exaggerations, yet many of these same authors permit their work to be illustrated with charts which are so arranged as to cause an erroneous interpretation. If authors and editors will inspect their charts as carefully as they revise their written matter, we shall have, in a very short time, a standard of reliability in charts and illustrations just as high as now found in the average printed page." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"The principles of charting and curve plotting are not at all complex, and it is surprising that many business men dodge the simplest charts as though they involved higher mathematics or contained some sort of black magic. [...] The trouble at present is that there are no standards by which graphic presentations can be prepared in accordance with definite rules so that their interpretation by the reader may be both rapid and accurate. It is certain that there will evolve for methods of graphic presentation a few useful and definite rules which will correspond with the rules of grammar for the spoken and written language." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"Though graphic presentations are used to a very large extent to-day there are at present no standard rules by which the person preparing a chart may know that he is following good practice. This is unfortunate because it permits everyone making a chart to follow his own sweet will. Many charts are being put out to-day from which it would seem that the person making them had tried deliberately to get up some method as different as possible from any which had ever been used previously." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"Though variety in method of charting is sometimes desirable in large reports where numerous illustrations must follow each other closely, or in wall exhibits where there must be a great number of charts in rapid sequence, it is better in general to use a variety of effects simply to attract attention, and to present the data themselves according to standard well-known methods." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"When large numbers of curves and charts are used by a corporation, it will be found advantageous to have certain standard abbreviations and symbols on the face of the chart so that information may be given in condensed form as a signal to anyone reading the charts." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"At the present time there is a total lack of standardization in the form of diagram to use for nearly all classes of representation. This makes it difficult to compare reports of different investigators on the same subject because their diagrams are not constructed alike." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation - outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, willful misunderstanding of other points of view - suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Making an evidence presentation is a moral act as well as an intellectual activity. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"There is often no one 'best' visualization, because it depends on context, what your audience already knows, how numerate or scientifically trained they are, what formats and conventions are regarded as standard in the particular field you’re working in, the medium you can use, and so on. It’s also partly scientific and partly artistic, so you get to express your own design style in it, which is what makes it so fascinating." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

📉Graphical Representation: Scales (Just the Quotes)

"For a curve the vertical scale. whenever practicable, should be so selected that the zero line will appear on the diagram. [...] If the zero line of the vertical scale will not normally appear on the curve diagram, the zero line should be shown by the use of a horizontal break in the diagram." (Joint Committee on Standards for Graphic Presentation, "Publications of the American Statistical Association" Vol.14 (112), 1915)

"If only one scale is used, it should be placed at the left-hand side of the chart. In very large charts it is sometimes desirable to repeat the scale at the right-hand side as well. Where two different units of measurement are used in the scales, the units should be carefully named so that there will be no danger of the reader's using the right-hand and the left-hand scales interchangeably as though they represented the same unit." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"It should be a strict rule for all kinds of curve plotting that the horizontal scale must be used. for the independent variable and the vertical scale for the dependent variable. When the curves are plotted by this rule the reader can instantly select a set of conditions from the horizontal scale and read the information from the vertical scale. If there were no rule relating to the arrangement of scales for the independent and dependent variables, the reader would never be able to tell whether he should approach a chart from the vertical scale and read the information from the horizontal scale, or the reverse." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Sometimes the scales of these accompanying charts are so large that the reader is puzzled to get clearly in his mind what the whole chart is driving at. There is a possibility of making a simple chart on such a large scale that the mere size of the chart adds to its complexity by causing the reader to glance from one side of the chart to the other in trying to get a condensed visualization of the chart." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"The scales of any curve-chart should be so selected that the chart will not be exaggerated in either the horizontal or the vertical direction. It is possible to cause a visual exaggeration of data by carelessly or intentionally selecting a scale which unduly stretches the chart in either the horizontal or the vertical direction. Just as the English language can be used to exaggerate to the ear, so charts can exaggerate to the eye." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"The zero of the scale should appear on every chart, and should shown by a heavy line carried across the sheet. If this is not done the reader may assume the bottom of the sheet to be zero and so be mis- led. The scale should be graduated from zero to a little over the maximum figure to be plotted on the charts, so that there will be a space between the highest peak on the curve and the top of the chart." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Under certain conditions, however, the ordinary form of graphic chart is slightly misleading. It will be conceded that its true function is to portray comparative fluctuations. This result is practically secured when the factors or quantities compared are nearly of the same value or volume, but analysis will show that this is not accomplished when the amounts compared differ greatly in value or volume. [...] The same criticism applies to charts which employ or more scales for various curve. If the different scale are in proper proportion, the result is the same as with one scale, but when two or more scales are used which are not proportional an indication may be given with respect to comparative fluctuations which is absolutely false." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"When dealing with very large quantities it is not always practicable to use a scale which starts at zero, and is carried up by even steps to a figure representing the highest peak on the curve. Such a chart would either be too large for convenient handling, or else the scale would have to be condensed so that only very large fluctuations would be indicated on the curve. In a ease of this kind the best practice is to start the at zero, and just above this point draw a wavy line across the sheet to indicate that the scale is broken at this point. This line can be very easily drawn with an ordinary serrated edge ruler as used by many accountants. The scale starts again on the upper side of the wavy line at a figure a little lower than the lowest point on the curve, and is carried up by even steps to a figure a little above the highest point to be shown on the curve." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"With the ordinary scale, fluctuations in large factors are very noticeable, while relatively greater fluctuations in smaller factors are barely apparent. The semi-logarithmic scale permits the graphic representation of changes in every quantity on the same basis, without respect to the magnitude of the quantity itself. At the same time, it shows the actual value by reference to the numbers in the scale column. By indicating both absolute and relative value and changes to one scale, it combines the advantages of both the natural and percentage scale, without the disadvantages of either." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Admittedly a chart is primarily a picture, and for presentation purposes should be treated as such; but in most charts it is desirable to be able to read the approximate magnitudes by reference to the scales. Such reference is almost out of the question without some rulings to guide the eye. Second, the picture itself may be misleading without enough rulings to keep the eye 'honest'. Although sight is the most reliable of our senses for measuring (and most other) purposes, the unaided eye is easily deceived; and there are numerous optical illusions to prove it. A third reason, not vital, but still of some importance, is that charts without rulings may appear weak and empty and may lack the structural unity desirable in any illustration." (Kenneth W Haemer, "Hold That Line. A Plea for the Preservation of Chart Scale Ruling", The American Statistician Vol. 1 (1) 1947)

"[….] double-scale charts are likely to be misleading unless the two zero values coincide (either on or off the chart). To insure an accurate comparison of growth the scale intervals should be so chosen that both curves meet at some point. This treatment produces the effect of percentage relatives or simple index numbers with the point of juncture serving as the base point. The principal advantage of this form of presentation is that it is a short-cut method of comparing the relative change of two or more series without computation. It is especially useful for bringing together series that either vary widely in magnitude or are measured in different units and hence cannot be compared conveniently on a chart having only one absolute-amount scale. In general, the double scale treatment should not be used for presenting growth comparisons to the general reader." (Kenneth W Haemer, "Double Scales Are Dangerous", The American Statistician Vol. 2 (3) , 1948)

"[…] many readers are confused by the presence of two scales, and either use the wrong one or simply disregard both. Also, the general reader has the disconcerting habit of believing that because one curve is higher than another, it is also larger in magnitude. This leads to all sorts of misconceptions." (Kenneth W Haemer, "Double Scales Are Dangerous", The American Statistician Vol. 2 (3) , 1948)

"The ratio chart not only correctly represents relative changes but also indicates absolute amounts at the same time. Because of its distinctive structure, it is referred to as a semilogarithmic chart. The vertical axis is ruled logarithmically and the horizontal axis arithmetically. The continued narrowing of the spacings of the scale divisions on the vertical axis is characteristic of logarithmic rulings; the equal intervals on the horizontal axis are indicative of arithmetic rulings." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The scales used are important; contracting or expanding the vertical or horizontal scales will change the visual picture. The trend lines need enough grid lines to obviate difficulty in reading the results properly. One must be careful in the use of cross-hatching and shading, both of which can create illusions. Horizontal rulings tend to reduce the appearance. while vertical lines enlarge it. In summary, graphs must be reliable, and reliability depends not only on what is presented but also on how it is presented." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The time-series plot is the most frequently used form of graphic design. With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, centuries, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"It is common for positive data to be skewed to the right: some values bunch together at the low end of the scale and others trail off to the high end with increasing gaps between the values as they get higher. Such data can cause severe resolution problems on graphs, and the common remedy is to take logarithms. Indeed, it is the frequent success of this remedy that partly accounts for the large use of logarithms in graphical data display." (William S Cleveland, "The Elements of Graphing Data", 1985)

"When magnitudes are graphed on a logarithmic scale, percents and factors are easier to judge since equal multiplicative factors and percents result in equal distances throughout the entire scale." (William S Cleveland, "The Elements of Graphing Data", 1985)

"When the data are magnitudes, it is helpful to have zero included in the scale so we can see its value relative to the value of the data. But the need for zero is not so compelling that we should allow its inclusion to ruin the resolution of the data on the graph." (William S Cleveland, "The Elements of Graphing Data", 1985)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"The rule is that a graph of a change in a variable with time should always have a vertical scale that starts with zero. Otherwise, it is inherently misleading." (Douglas A Downing & Jeffrey Clark, "Forgotten Statistics: A Self-Teaching Refresher Course", 1996)

"The more clues to meaning that are supplied elsewhere, the less the need for cluttersome scales." (Eric Meyer, "Designing Infographics", 1997) 

"Choose scales wisely, as they have a profound influence on the interpretation of graphs. Not all scales require that zero be included, but bar graphs and other graphs where area is judged do require it." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"The visual representation of a scale - an axis with ticks - looks like a ladder. Scales are the types of functions we use to map varsets to dimensions. At first glance, it would seem that constructing a scale is simply a matter of selecting a range for our numbers and intervals to mark ticks. There is more involved, however. Scales measure the contents of a frame. They determine how we perceive the size, shape, and location of graphics. Choosing a scale (even a default decimal interval scale) requires us to think about what we are measuring and the meaning of our measurements. Ultimately, that choice determines how we interpret a graphic." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005) 

"Use a scale break only when necessary. If a break cannot be avoided, use a full scale break. Taking logs can cure the need for a break." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"It is important to pay heed to the following detail: a disadvantage of logarithmic diagrams is that a graphical integration is not possible, i.e., the area under the curve (the integral) is of no relevance." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Another way to obscure the truth is to hide it with relative numbers. […] Relative scales are always given as percentages or proportions. An increase or decrease of a given percentage only tells us part of the story, however. We are missing the anchoring of absolute values." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"One way a chart can lie is through overemphasis of the size and scale of items, particularly when the dimension of depth isnʼt considered." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Color can tell us where to look, what to compare and contrast, and it can give us a visual scale of measure. Because color can be so effective, it is often used for multiple purposes in the same graphic - which can create graphics that are dazzling but difficult to interpret. Separating the roles that color can play makes it easier to apply color specifically for encouraging different kinds of visual thinking. [...] Choose colors to draw attention, to label, to show relationships (compare and contrast), or to indicate a visual scale of measure." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Geographic maps have the advantage of being true to scale - great for walking. Diagrams have the advantage of being easily imaged and remembered, often true to a non-pedestrian experience, and the ability to open up congestion, reduce empty space, and use real estate efficiently. Hybrids 'mapograms' ? - often have the disadvantages of both map and diagram with none of the corresponding advantages." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Context (information that lends to better understanding the who, what, when, where, and why of your data) can make the data clearer for readers and point them in the right direction. At the least, it can remind you what a graph is about when you come back to it a few months later. […] Context helps readers relate to and understand the data in a visualization better. It provides a sense of scale and strengthens the connection between abstract geometry and colors to the real world." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Whichever scale is used to represent the data, it is important to keep it consistent in data presentations. The principles of clarity, precision, and efficiency are rarely met if the measurement scales change within tables." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

07 December 2011

📉Graphical Representation: Good Graphics (Just the Quotes)

"A good graphic must give the impression that its various parts all belong together. They must be arranged in such a way that the illustration looks like a single entity. A good graphic chart should be more than just the sum of its individual lines, shapes, and shades. It should be more than the individual bars in a bar chart, more than the pieces of a pie chart, more than the boxes in a flow chart. Unity requires the establishment of coherent relationships among the component parts of the drawing. These relationships can be depicted in a very direct manner through the use of connecting lines that serve to connect shapes." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Unlike some art forms. good graphics should be as concrete, geometrical, and representational as possible. A rectangle should be drawn as a rectangle, leaving nothing to the reader's imagination about what you are trying to portray. The various lines and shapes used in a graphic chart should be arranged so that it appears to be balanced. This balance is a result of the placement of shapes and lines in an orderly fashion." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Generally speaking, a good display is one in which the visual impact of its components is matched to their importance in the context of the analysis. Consider the issue of overplotting." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"Of course statistical graphics, just like statistical calculations, are only as good as what goes into them. An ill-specified or preposterous model or a puny data set cannot be rescued by a graphic (or by calculation), no matter how clever or fancy. A silly theory means a silly graphic." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Good graphics can be spoiled by bad annotation. Labels must always be subservient to the information to be conveyed, and legibility should never be sacrificed for style. All the information on the sheet should be easy to read, and more important, easy to interpret. The priorities of the information should be clearly expressed by the use of differing sizes, weights and character of letters." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"Good graphic design is not a panacea for bad copy, poor layout or misleading statistics. If any one of these facets are feebly executed it reflects poorly on the work overall, and this includes bad graphs and charts." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"(1) Good data visualization is trustworthy: Is it reliable? Is the portrayal of the data and the subject faithful? Do the representation and presentation design have integrity? (2) Good data visualization is accessible: Is it usable? Is the portrayal of the data and the subject relevant? Is the representation and presentation design suitably understandable? (3) Good data visualization is elegant: Is it aesthetic? Is the representation and presentation design appealing?" (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

"Graphic design is not just about making things look good. It is a powerful combination of form and function that uses visual elements to communicate a message. Form refers to the physical appearance of a design, such as its shape, color, and typography. Function refers to the purpose of a design, such as what it is trying to communicate or achieve. A good graphic design is both visually appealing and functional. It uses the right combination of form and function to communicate its message effectively. Graphic design is also a strategic and thoughtful craft. It requires careful planning and execution to create a design that is both effective and aesthetically pleasing." (Faith Aderemi, "The Essential Graphic Design Handbook", 2024)

05 December 2011

📉Graphical Representation: Venn Diagrams (Just the Quotes)

"[...] for merely theoretical purposes the rule of formation would be very simple. It would merely be to begin by drawing any closed figure, and then proceed [sic] to draw others, subject to the one condition that each is to intersect once and once only all the existing subdivisions produced by those which had gone before." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"[…] it must be noticed that these diagrams do not naturally harmonize with the propositions of ordinary life or ordinary logic. […] The great bulk of the propositions which we commonly meet with are founded, and rightly founded, on an imperfect knowledge of the actual mutual relations of the implied classes to one another. […] one very marked characteristic about these circular diagrams is that they forbid the natural expression of such uncertainty, and are therefore only directly applicable to a very small number of such propositions as we commonly meet with." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"[...] we can not readily break up a complicated problem into successive steps which can be taken independently. We have, in fact, to solve the problem first, by determining what are the actual mutual relations of the classes involved, and then to draw the circles to represent this final result; we cannot work step-by-step towards the conclusion by aid of our figures." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"Whereas the Eulerian plan endeavoured at once and directly to represent propositions, or relations of class terms to one another, we shall find it best to begin by representing only classes, and then proceed to modify these in some way so as to make them indicate what our propositions have to say. How, then, shall we represent all the subclasses which two or more class terms can produce? Bear in mind that what we have to indicate is the successive duplication of the number of subdivisions produced by the introduction of each successive term. and we shall see our way to a very important departure from the Eulerian conception. All that we have to do is to draw our figures, say circles, so that each successive one which we introduce shall intersect once, and once only, all the subdivisions already existing, and we then have what may be called a general framework indicating every possible combination producible by the given class terms." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"We endeavour to employ only symmetrical figures, such as should not only be an aid to reasoning, through the sense of sight, but should also be to some extent elegant in themselves." (John Venn, "Symbolic Logic", 1881)

"At the basis of our Symbolic Logic, however represented, whether by words by letters or by diagrams, we shall always find the same state of things. What we ultimately have to do is to break up the entire field before us into a definite number of classes or compartments which are mutually exclusive and collectively exhaustive." (John Venn, "Symbolic Logic" 2nd Ed., 1894)

"The best way of introducing this question will be to enquire a little more strictly whether it is really classes that we thus represent, or merely compartments into which classes may be put? […] The most accurate answer is that our diagrammatic subdivisions, or for that matter our symbols generally, stand for compartments and not for classes. We may doubtless regard them as representing the latter, but if we do so we should never fail to keep in mind the proviso, 'if there be such things in existence'. And when this condition is insisted upon, it seems as if we expressed our meaning best by saying that what our symbols stand for are compartments which may or may not happen to be occupied." (John Venn, "Symbolic Logic" 2nd Ed., 1894)

"A Venn diagram is a simple representation of the sample space, that is often helpful in seeing 'what is going on'. Usually the sample space is represented by a rectangle, with individual regions within the rectangle representing events. It is often helpful to imagine that the actual areas of the various regions in a Venn diagram are in proportion to the corresponding probabilities. However, there is no need to spend a long time drawing these diagrams - their use is simply as a reminder of what is happening." (Graham Upton & Ian Cook, "Introducing Statistics", 2001)

"Two types of graphic organizers are commonly used for comparison: the Venn diagram and the comparison matrix [...] the Venn diagram provides students with a visual display of the similarities and differences between two items. The similarities between elements are listed in the intersection between the two circles. The differences are listed in the parts of each circle that do not intersect. Ideally, a new Venn diagram should be completed for each characteristic so that students can easily see how similar and different the elements are for each characteristic used in the comparison." (Robert J. Marzano et al, "Classroom Instruction that Works: Research-based strategies for increasing student achievement, 2001)

"The notion of outcomes covering a space is a very useful mental image, as it ties in strongly with the use of Venn diagrams and tables for clarifying the nature of possible events resulting from a trial. There are two important aspects to this. First, when enumerating the various outcomes that comprise an event, the number of (equally. likely) outcomes should correspond, visually, with the area of that part of the diagram represented by the event in question - the greater the probability, the larger the area. Secondly, where events overlap (for example, when rolling a die, consider the two events 'getting an even score' and 'getting a score greater than 2' ), the various regions in the Venn diagram help to clarify the various combinations of events that might occur." (Alan Graham, "Developing Thinking in Statistics", 2006)

📉Graphical Representation: Tools (Just the Quotes)

"Recognize effective results. Does the type of chart selected give a comprehensive picture of the situation? Does the size of chart and visual aid used satisfy all audience requirements? Do materials meet all reproduction problems? Is the layout well balanced and style of lettering uniform? Does the chart as a whole accurately present the facts? Is the projected idea an effective visual tool?" (Mary E Spear, "Charting Statistics", 1952)

"The grid with the vertical ruling carrying the logarithmic scale and the horizontal ruling carrying the arithmetic scale denoting time is the most common. The reverse may be used, and the horizontal ruling may carry the log scale. Charts of this type are frequently referred to as 'semilog charts'. [...] The full or double log scale (with the log grid carried on both horizontal and vertical rulings) is used mostly for statistical study and economic analysis and is not a good tool for popular presentation of data." (Mary E Spear, "Charting Statistics", 1952)

"Graphic forms help us to perform and influence two critical functions of the mind: the gathering of information and the processing of that information. Graphs and charts are ways to increase the effectiveness and the efficiency of transmitting information in a way that enhances the reader's ability to process that information. Graphics are tools to help give meaning to information because they go beyond the provision of information and show relationships, trends, and comparisons. They help to distinguish which numbers and which ideas are more important than others in a presentation." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"The square has always had a no-nonsense sort of image. Stable, solid, and - well - square. Perhaps that's why it is the shape used in business visuals in those rare cases where a visual is even bothered with. Flip through most business books and you'll find precious few places for your eye to stop and your visual brain to engage. But when you do, the shape of the graphic, chart, matrix, table, or diagram is certainly square. It's a comfortable shape, which makes it a valuable implement in your kit of visual communication tools." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"The triangle is one of the best tools for visualizing a problem. Every difficult problem I've encountered in business breaks down into pieces, which carry different weight and importance. The pieces with the most importance sit at the top of the triangle, which progresses down to the sometimes thorny but less important piece at the base." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"Visual thinking can begin with the three basic shapes we all learned to draw before kindergarten: the triangle, the circle, and the square. The triangle encourages you to rank parts of a problem by priority. When drawn into a triangle, these parts are less likely to get out of order and take on more importance than they should. While the triangle ranks, the circle encloses and can be used to include and/or exclude. Some problems have to be enclosed to be managed. Finally, the square serves as a versatile problem-solving tool. By assigning it attributes along its sides or corners, we can suddenly give a vague issue a specific place to live and to move about." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Good numeric representation is a key to effective thinking that is not limited to understanding risks. Natural languages show the traces of various attempts at finding a proper representation of numbers. [...] The key role of representation in thinking is often downplayed because of an ideal of rationality that dictates that whenever two statements are mathematically or logically the same, representing them in different forms should not matter. Evidence that it does matter is regarded as a sign of human irrationality. This view ignores the fact that finding a good representation is an indispensable part of problem solving and that playing with different representations is a tool of creative thinking." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"To analyze means to untangle. Even when we 'let the data speak for themselves', we need to untangle some aspect of the data before displaying things in a graphic. The more analytics we can include in the process of displaying graphics, the more flexibility our tools will have." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Graphics, charts, and maps aren’t just tools to be seen, but to be read and scrutinized. The first goal of an infographic is not to be beautiful just for the sake of eye appeal, but, above all, to be understandable first, and beautiful after that; or to be beautiful thanks to its exquisite functionality." (Alberto Cairo, "The Functional Art", 2011)

"The first and main goal of any graphic and visualization is to be a tool for your eyes and brain to perceive what lies beyond their natural reach." (Alberto Cairo, "The Functional Art", 2011)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Key Performance Indicators (KPIs) in many organizations are a broken tool. The KPIs are often a random collection prepared with little expertise, signifying nothing. [...] KPIs should be measures that link daily activities to the organization’s critical success factors (CSFs), thus supporting an alignment of effort within the organization in the intended direction." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"There is a story in your data. But your tools don’t know what that story is. That’s where it takes you - the analyst or communicator of the information - to bring that story visually and contextually to life." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Commonly, data do not make a clear and unambiguous statement about our world, often requiring tools and methods to provide such clarity. These methods, called statistical data analysis, involve collecting, manipulating, analyzing, interpreting, and presenting data in a form that can be used, understood, and communicated to others." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Exploring data generates hypotheses about patterns in our data. The visualizations and tools of dynamic interactive graphics ease and improve the exploration, helping us to 'see what our data seem to say'." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"Color is difficult to use effectively. A small number of well-chosen colors can be highly distinguishable, particularly for categorical data, but it can be difficult for users to distinguish between more than a handful of colors in a visualization. Nonetheless, color is an invaluable tool in the visualization toolbox because it is a channel that can carry a great deal of meaning and be overlaid on other dimensions. […] There are a variety of perceptual effects, such as simultaneous contrast and color deficiencies, that make precise numerical judgments about a color scale difficult, if not impossible." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Maps also have the disadvantage that they consume the most powerful encoding channels in the visualization toolbox - position and size - on an aspect that is held constant. This leaves less effective encoding channels like color for showing the dimension of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

04 December 2011

📉Graphical Representations: Dashboards (Just the Quotes)

"The real value of dashboard products lies in their ability to replace hunt‐and‐peck data‐gathering techniques with a tireless, adaptable, information‐flow mechanism. Dashboards transform data repositories into consumable information." (Gregory L Hovis, "Stop Searching for InformationMonitor it with Dashboard Technology," DM Direct, 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

“Dashboards aren't all that different from some of the other means of presenting information, but when properly designed the single-screen display of integrated and finely tuned data can deliver insight in an especially powerful way.” (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication." (Stephen Few, "Information Dashboard Design", 2006)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006) 

"Having a purposeless or poorly performing dashboard is more common than not. This happens when the underlying architecture is not designed properly to support the needs of dashboard interaction. There is an obvious disconnect between the design of the data warehouse and the design of the dashboards. The people who design the data warehouse do not know what the dashboard will do; and the people who design the dashboards do not know how the data warehouse was designed, resulting in a lack of cohesion between the two. A similar disconnect can also exist between the dashboard designer and the business analyst, resulting in a dashboard that may look beautiful and dazzling but brings very little business value." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"In general, it still holds true that 'there is no such thing as a free lunch'. What this means is that the most advanced dashboard solutions with the most features and flexibility are generally also the technologies that require more setup and more skill sets from the administrators and the end users. In some cases companies 'dumb down' their dashboard application in the initial stages of deployment so as not to scare their users with too many options. Later, when a dashboard culture has developed, they open up more of the functionality." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"A dashboard is like the executive summary of a report. We read executive summaries and skip the body of the report if the summary is more or less in line with our expectations. Trouble is, measurement is never exhaustive. It is only when we dive in that we realize what areas may have been missed." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"[…] an overall green status indicator doesn’t mean anything most of the time. All it says is that the things under measurement seem okay. But there always will be many more things not under measurement. To celebrate green indicators is to ignore the unknowns. […] The tendency to roll up metrics into dashboards promotes ignorance of the real situation on the ground. We forget that we only see what is under measurement. We only act when something is not green." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Rolling up fine-grained metrics to create high-level dashboards puts pressure on teams to keep the fine-grained metrics green even when it might not be the best use of their time." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"All human storytellers bring their subjectivity to their narratives. All have bias, and possibly error. Acknowledging and defusing that bias is a vital part of successfully using data stories. By debating a data story collaboratively and subjecting it to critical thinking, organizations can get much higher levels of engagement with data and analytics and impact their decision making much more than with reports and dashboards alone." (James Richardson, 2017)

"Dashboards are a type of multiform visualization used to summarize and monitor data. These are most useful when proxies have been well validated and the task is well understood. This design pattern brings a number of carefully selected attributes together for fast, and often continuous, monitoring - dashboards are often linked to updating data streams. While many allow interactivity for further investigation, they typically do not depend on it. Dashboards are often used for presenting and monitoring data and are typically designed for at-a-glance analysis rather than deep exploration and analysis." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Infographics combine art and science to produce something that is not unlike a dashboard. The main difference from a dashboard is the subjective data and the narrative or story, which enhances the data-driven visual and engages the audience quickly through highlighting the required context." (Travis Murphy, "Infographics Powered by SAS®: Data Visualization Techniques for Business Reporting", 2018)

"Dashboards are collections of several linked visualizations all in one place. The idea is very popular as part of business intelligence: having current data on activity summarized and presented all inone place. One danger of cramming a lot of disparate information into one place is that you will quickly hit information overload. Interactivity and small multiples are definitely worth considering as ways of simplifying the information a reader has to digest in a dashboard. As with so many other visualizations, layering the detail for different readers is valuable." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"[Dashboards] are popular methods for displaying multiple visualizations and statistical information. Dashboards often take the form of some organizational instrument that offers both at-a-glance and detailed views of many different analytical and information dimensions. Dashboards are not a unique chart type themselves, but rather should be considered compositions that comprise multiple chart types." (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

"Understanding the entire data ecosystem, from the production of a data point to its consumption in a dashboard or a visualization, provides the ability to invoke action, which is more valuable than the mere sum of its parts." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"A well-designed dashboard needs to provide a similar experience; information cannot be placed just anywhere on the dashboard. Charts that relate to one another are usually positioned close to one another. Important charts often appear larger and more visually prominent than less important ones. In other words, there are natural sizes for how a dashboard comprises charts based on the task and context." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"As we enter into certain types of analytical conversations, we expect the conversations to flow in a predictable and cohesive manner. A KPI dashboard, for example, uses redundant structures across specific dimensions or measures to convey information. A dashboard with a top-down exposition style provides high-level information first and clarifies downward, while a bottom-up dashboard starts with the details and clarifies them against the larger picture." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Chart choices can also create weight within the entire composition. Presenting information as a comprehensive visualization, such as in a dashboard, requires thinking beyond individual charts. In writing, we not only craft sentences, but write the composition as an entire piece. Certain sentences may drive the writing more, but all sentences play a role in conveying the message." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"The sizes of charts in space reflect how we convey information to a reader. In a dashboard context, the content, size, and space that the various charts occupy should reflect the form and function of the main message. As you saw with the bento box metaphor from the introduction, there needs to be deliberate thought put into the placement and size of each individual chart so that they all work together in harmony." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"When integrating written text with charts in a functionally aesthetic way, the reader should be able to find the key takeaways from the chart or dashboard, taking into account the context, constraints, and reading objectives of the overall message."  (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Information Design (Just the Quotes)

"The ducks of information design are false escapes from flatland, adding pretend dimensions to impoverished data sets, merely fooling around with information." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"Good information design is clear thinking made visible, while bad design is stupidity in action." (Edward Tufte, "Visual Explanations" , 1997)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"Information design is defined as the art and science of preparing information so that can be used by human beings with efficiency and effectiveness. Its primary objectives are:To develop documents that are comprehensible, rapidly and accurately retrievable, and easy to translate into effective actions [...]" (Sheila Pontis, "La historia de la esquematica en la visualization de datos", 2007)

"I feel that every day, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless; it literally pours in." (David McCandless, "The beauty of data visualization", TEDGlobal, 2010) 

"The composing of intelligible patterns from the noise of raw data is a hallmark of a good information designer. The most successful examples extract and present essential relationships in a coherent manner while limiting the obtrusiveness of accessory relationships. Effective results are self-evident whereby the information graphic is absorbed by the mind holistically." (William A Anderson & William M Bevington, "Complications and Adjacencies: An Organizing Logic for Information Graphics", Parsons Journal of Information Mapping Vol. II(3), 2010)

"Information design, when successful - whether in print, on the web, or in the environment - represents the functional balance of the meaning of the information, the skills and inclinations of the designer, and the perceptions, education, experience, and needs of the audience." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Successful information design in movement systems gives the user the information he needs - and only the information he needs - at every decision point." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012) 

"Information design is a design practice concerned with the presentation of information. It is often associated with the activities of data visualization; indeed sometimes it is presented as the major field in which data visualization belongs. Unquestionably, both share an underlying motive to facilitate understanding. However, in my view, information design has a much broader application concerned with the design of many different forms of visual communication, particularly those with an instructional or functional slant, such as way-finding devices like hospital building maps or in the design of utility bills." (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

✏️Antony Unwin - Collected Quotes

"Deciding on which graphics to use is often a matter of taste. What one person thinks are good graphics for illustrating information may not appeal to someone else. It may also happen that different people interpret the same graphic in quite different ways. (Antony Unwin [in "Graphics of Large Datasets: Visualizing a Million"], 2006) 

"Clearly principles and guidelines for good presentation graphics have a role to play in exploratory graphics, but personal taste and individual working style also play important roles. The same data may be presented in many alternative ways, and taste and customs differ as to what is regarded as a good presentation graphic. Nevertheless, there are principles that should be respected and guidelines that are generally worth following. No one should expect a perfect consensus where graphics are concerned." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008) 

"For a given dataset there is not a great deal of advice which can be given on content and context. hose who know their own data should know best for their specific purposes. It is advisable to think hard about what should be shown and to check with others if the graphic makes the desired impression. Design should be let to designers, though some basic guidelines should be followed: consistency is important (sets of graphics should be in similar style and use equivalent scaling); proximity is helpful (place graphics on the same page, or on the facing page, of any text that refers to them); and layout should be checked (graphics should be neither too small nor too large and be attractively positioned relative to the whole page or display)." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"There are two main reasons for using graphic displays of datasets: either to present or to explore data. Presenting data involves deciding what information you want to convey and drawing a display appropriate for the content and for the intended audience. [...] Exploring data is a much more individual matter, using graphics to find information and to generate ideas.Many displays may be drawn. They can be changed at will or discarded and new versions prepared, so generally no one plot is especially important, and they all have a short life span." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013) 

03 December 2011

📉Graphical Representation: Charts vs. Thousand Words (Just the Quotes)

"The drawing shows me at a glance what would be spread over ten pages in a book." (Ivan Turgenev, 1862) [2]

"Sometimes, half a dozen figures will reveal, as with a lighting-flash, the importance of a subject which ten thousand labored words with the same purpose in view, had left at last but dim and uncertain." (Mark Twain, "Life on the Mississippi", 1883) 

"One good picture is worth many pages of written description." (William Sproston Caine, 1891) [2]

"One look is worth a thousand words" (Kathleen Caffyn, 1903) 

"Use a picture. It's worth a thousand words." (Arthur Brisbane, The Post-Standard, 1911)

"One Look Is Worth A Thousand Words" ([advertisement] 1913)

"A picture is worth ten thousand words. If you can’t see the truth in these pictures you are among the vast majority that must learn only by experience." (Arthur Brisbane, 1915)

"One picture is worth ten thousand words." (Frederick R Barnard, Printer’s Ink, 1921)

"One Picture Worth Ten Thousand Words" ([Chinese proverb] 1927)

"In many instances, a picture is indeed worth a thousand words. To make this true in more diverse circumstances, much more creative effort is needed to pictorialize the output from data analysis. Naive pictures are often extremely helpful, but more sophisticated pictures can be both simple and even more informative." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Graphic charts are ways of presenting quantitative as well as qualitative information in an efficient and effective visual form. Numbers and ideas presented graphically are often more easily understood. remembered. and integrated than when they are presented in narrative or tabular form. Descriptions. trends. relationships, and comparisons can be made more apparent. Less time is required to present and comprehend information when graphic methods are employed. As the old truism states, 'One picture is worth a thousand words.'" (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"One word is worth a thousand pictures. If it's the right word." (Edward Abbey, "Beyond the Wall: Essays from the Outside", 1984)

"A picture may be worth a thousand words, a formula is worth a thousand pictures." (Edsger Dijkstra, [conference at ETH Zurich] 1994)

"A magnificent picture is never worth a thousand perfect words." (John Dunning, "The Bookman's Wake", 1995)

"A picture tells a thousand words. But you get a thousand pictures from someone's voice." (Paul Fleischman, "Seek", 2001)

"If a picture is worth a thousand words, a metaphor is worth a thousand pictures." (Daniel H Pink, "A Whole New Mind: Why Right-Brainers Will Rule the Future", 2005)

"The amount of information rendered in a single financial graph is easily equivalent to thousands of words of text or a page-sized table of raw values. A graph illustrates so many characteristics of data in a much smaller space than any other means. Charts also allow us to tell a story in a quick and easy way that words cannot." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Visual reports exploit the idea that a picture is worth a thousand words and, in particular, for many tasks a picture is more useful than a large table of numbers." (Stephen G Eick, "Graph Drawing for Data Analytics" [in "Handbook of Graph Drawing and Visualization"] , 2013)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"The caption should explain what is shown, possibly also giving the data source. Captions should be detailed enough that the graphic can pretty well stand on its own. Longer is usually better than shorter. A picture may be worth a thousand words, but you need at least some words to describe and explain it." (Antony Unwin, "Graphical Data Analysis with R", 2015)

"A picture may be worth a thousand words, but not all pictures are readable, interpretable, meaningful, or relevant." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"'A picture is worth a thousand words' is definitely true, and graphs can help you tell a story about your data that would otherwise go untold with only numerical summaries and statistics. While inferential statistics and effect size measures can help us draw relatively reliable conclusions from our data, graphs and visualizations can help make the scientific findings accessible to virtually anyone, even with minimal coursework in statistics or data science." (Daniel J Denis, "Univariate, Bivariate, and Multivariate Statistics Using R: Quantitative Tools for Data Analysis and Data Science, 2020)

"Although a picture may be worth a thousand words, a single static picture is in most cases insufficient for a valid analysis and for understanding of a complex subject. It is usual that an analyst needs to see different aspects or parts of data and look at the data from different perspectives. This means that the analyst needs to interact with the data and with the system that generates visual displays of the data: select data components and subsets for viewing, select and tune visualization techniques, transform the views, transform the data, and so on." (Natalia Andrienko et al, "Visual Analytics for Data Scientists", 2020)

"A picture really can be worth a thousand words, and human beings are adept at extracting useful information from visual presentations. Modern data analysis increasingly relies on graphical presentations to uncover meaning and convey results." (Robert I Kabacoff, "R in Action: Data analysis and graphics with R and Tidyverse", 2022)

"A good metaphor is worth a thousand pictures." (Anon) 

"As the Chinese say, 1001 words is worth more than a picture." (John McCarthy [source]) 

References:
[1] Wikipedia (2024) A picture is worth a thousand words [link]
[2] Quote Investigator (2022) A Picture Is Worth Ten Thousand Words [link


SQL Server New Features: Window Functions

Introduction

     In the past, in the absence or in parallel with other techniques, aggregate functions proved to be quite useful in order to solve several types of problems that involve the retrieval of first/last record or the display of details together with averages and other aggregates. Typically their use involves two or more joins between a dataset and an aggregation based on the same dataset or a subset of it. An aggregation can involve one or more columns that make the object of analysis. Sometimes it might be needed multiple such aggregations based on different sets of columns. Each such aggregation involves at least a join. Such queries can become quite complex, though they were a price to pay in order to solve such problems.

Partitions

     The introduction of analytic functions in Oracle and of window functions, a similar concept, in SQL Server, allowed the approach of such problems from a different simplified perspective. Central to this feature it’s the partition (of a dataset), its meaning being same as of mathematical partition of a set, defined as a division of a set into non-overlapping and non-empty parts that cover the whole initial set. The introduction of partitions it’s not necessarily something new, as the columns used in a GROUP BY clause determines (implicitly) a partition in a dataset. The difference in analytic/window functions is that the partition is defined explicitly inline together with a ranking or average function evaluated within a partition. If the concept of partition is difficult to grasp, let’s look at the result-set based on two Products (the examples are based on AdventureWorks database):
 
-- Price Details for 2 Products 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost  
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
ORDER BY A.ProductID 
, A.StartDate 

window function - details

   In this case a partition is “created” based on the first Product (ProductId = 707), while a second partition is based on the second Product (ProductId = 708). As a parenthesis, another partitioning could be created based on ProductId and StartDate; considering that the two attributes are a key in the table, this will partition the dataset in partitions of 1 record (each partition will have exactly one record).

Details and Averages

     In order to exemplify the use of simple versus window aggregate functions, let’s consider a problem in which is needed to display Standard Price details together with the Average Standard Price for each ProductId. When a GROUP BY clause is applied in order to retrieve the Average Standard Cost, the query is written under the form: 

-- Average Price for 2 Products 
SELECT A.ProductID  
, AVG(A.StandardCost) AverageStandardCost 
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
GROUPBY A.ProductID  
ORDERBY A.ProductID 

window function - GROUP BY 

    In order to retrieve the details, the query can be written with the help of a FULL JOIN as follows:

-- Price Details with Average Price for 2 Products - using JOINs 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
, B.AverageStandardCost 
, A.StandardCost - B.AverageStandardCost DiffStandardCost 
FROM [Production].[ProductCostHistory] A    
  JOIN ( -- average price        
    SELECT A.ProductID         
    , AVG(A.StandardCost) AverageStandardCost         
    FROM [Production].[ProductCostHistory] A        
    WHERE A.ProductID IN (707, 708)        
    GROUP BY A.ProductID      
) B  
    ON A.ProductID = B.ProductID 
WHERE A.ProductID IN (707, 708) 
ORDERBY A.ProductID 
, A.StartDate 

 window function - Average Price JOIN   

    As pointed above the partition is defined by ProductId. The same query written with window functions becomes:

-- Price Details with Average Price for 2 Products - using AVG window function 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
, AVG(A.StandardCost) OVER(PARTITION BY A.ProductID) AverageStandardCost 
, A.StandardCost - AVG(A.StandardCost) OVER(PARTITION BY A.ProductID) DiffStandardCost 
FROM [Production].[ProductCostHistory] A 
WHERE A.ProductID IN (707, 708) 
ORDER BY A.ProductID 
, A.StartDate 

window function - Average Price WF









    As can be seen, in the second example, the AVG function is defined using the OVER clause with PartitionId as partition. Even more, the function is used in a formula to calculate the Difference Standard Cost. More complex formulas can be written making use of multiple window functions.  

The Last Record

     Let’s consider the problem of retrieving the nth record. Because with aggregate functions is easier to retrieve the first or last record, let’s consider that is needed to retrieve the last Standard Price for each ProductId. The aggregate function helps to retrieve the greatest Start Date, which farther helps to retrieve the record containing the Last Standard Price.

-- Last Price Details for 2 Products - using JOINs 
SELECT A.ProductID  
, A.StartDate 
, A.EndDate 
, A.StandardCost 
FROM [Production].[ProductCostHistory] A  
    JOIN ( -- average price          
    SELECT A.ProductID          
    , Max(A.StartDate) LastStartDate          
    FROM [Production].[ProductCostHistory] A          
    WHERE A.ProductID IN (707, 708)          
    GROUP BY A.ProductID      
) B      
   ON A.ProductID = B.ProductID  
  AND A.StartDate = B.LastStartDate 
WHERE A.ProductID IN (707, 708) 
ORDERBY A.ProductID 
,A.StartDate 

window function - Last Price JOIN  

With window functions the query can be rewritten as follows:

-- Last Price Details for 2 Products - using AVG window function 
SELECT * 
FROM (-- ordered prices      
    SELECT A.ProductID      
    , A.StartDate      
    , A.EndDate      
    , A.StandardCost      
    , RANK() OVER(PARTITION BY A.ProductID ORDER BY A.StartDate DESC) Ranking      
    FROM [Production].[ProductCostHistory] A     
    WHERE A.ProductID IN (707, 708) 
  ) A 
WHERE Ranking = 1 
ORDER BY A.ProductID 
, A.StartDate 

window function - Last Price WF  

   As can be seen, in order to retrieve the Last Standard Price, was considered the RANK function, the results being ordered descending by StartDate. Thus, the Last Standard Price will be always positioned on the first record. Because window functions can’t be used in WHERE clauses, it’s needed to encapsulate the initial logic in a subquery. Similarly could be retrieved the First Standard Price, this time ordering ascending the StartDate. The last query can be easily modified to retrieve the nth records (this can prove to be more difficult with simple average functions), the first/last nth records.

Conclusion

    Without going too deep into details, I shown above two representative scenarios in which solutions based on average functions could be simplified by using window functions. In theory the window functions provide greater flexibility but they have their own trade offs too. In the next posts I will attempt to further detail their use, especially in the context of Statistics.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.