30 December 2006

✏️Colin Ware - Collected Quotes

"Why should we be interested in visualization? Because the human visual system is a pattern seeker of enormous power and subtlety. The eye and the visual cortex of the brain form a massively parallel processor that provides the highest-bandwidth channel into human cognitive centers. At higher levels of processing, perception and cognition are closely interrelated, which is the reason why the words 'understanding' and 'seeing' are synonymous." (Colin Ware, 2000)

"A good visualization is not just a static picture or a 3D virtual environment that we can walk through and inspect like a museum full of statues. A good visualization is something that allows us to drill down and find more data about anything that seems important." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Chernoff faces have not generally been adopted in practical visualization applications. The main reason for this may be the idiosyncratic nature of faces. When data is mapped to faces, many kinds of perceptual interactions can occur. Sometimes the combination of variables will result in a particular stereotypical face, perhaps a happy face or a sad face, and this will be identified more readily. In addition, there are undoubtedly great differences in our sensitivity to the different features. We may be more sensitive to the curvature of the mouth than to the height of the eyebrows, for example. This means that the perceptual space of Chernoff faces is likely to be extremely nonlinear. In addition, there are almost certainly many uncharted interactions between facial features, and these are likely to vary from one viewer to another." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Diagrams are always hybrids of the conventional and the perceptual. Diagrams contain conventional elements, such as abstract labeling codes, that are difficult to learn but formally powerful. They also contain information that is coded according to perceptual rules, such as Gestalt principles. Arbitrary mappings may be useful, as in the case of mathematical notation, but a good diagram takes advantage of basic perceptual mechanisms that have evolved to perceive structure in the environment." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"It is useful to think of color as an attribute of an object rather than as its primary characteristic. It is excellent for labeling and categorization, but poor for displaying shape, detail, or space." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Interactive visualization is a process made up of a number of interlocking feedback loops that fall into three broad classes. At the lowest level is the data manipulation loop, through which objects are selected and moved using the basic skills of eye–hand coordination. Delays of even a fraction of a second in this interaction cycle can seriously disrupt the performance of higher-level tasks. At an intermediate level is an exploration and navigation loop, through which an analyst finds his or her way in a large visual data space." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"The great advantage of the treemap over conventional tree views is that the amount of information on each branch of the tree can be easily visualized. Because the method is space-filling, it can show quite large trees containing thousands of branches. The disadvantage is that the hierarchical structure is not as clear as it is in a more conventional tree drawing, which is a specialized form of node–link diagram." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"The problem with the view that metadata and primary data are somehow essentially different is that all data is interpreted to some extent - there is no such thing as raw data. Every data gathering instrument embodies some particular interpretation in the way it is built. Also, from the practical viewpoint of the visualization designer, the problems of representation are the same for metadata as for primary data. In both cases, there are entities, relationships, and their attributes to be represented, although some are more abstract than others." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"One reason design is difficult is that the designer already has the knowledge expressed in the design, has seen it develop from inception, and therefore cannot see it with fresh eyes. The solution is to be analytic and this is where this book is intended to add value. Effective design should start with a visual task analysis, determine the set of visual queries to be supported by a design, and then use color, form, and space to efficiently serve those queries." (Colin Ware, "Visual Thinking for Design", 2008)

"Design graphic representations of data by taking into account human sensory capabilities in such a way that important data elements and data patterns can be quickly perceived." (Colin Ware, "Information Visualization: Perception for Design" 4th Ed., 2021)

"Important data should be represented by graphical elements that are more visually distinct than those representing less important information." (Colin Ware, "Information Visualization: Perception for Design" 4th Ed., 2021)

29 December 2006

✏️Anker V Andersen - Collected Quotes

"An economic justification for computer graphics is that the organization spends an enormous amount of money on data processing, often providing managers with too many reports, too many data, and an overload of information. The report output has to be condensed into a more usable form. The computer graph essentially is the data represented in a structured pictorial form. The role of the graph is to provide meaningful reports. To the extent that it does. it can be justified." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Graphs are used to meet the need to condense all the available information into a more usable quantity. The selection process of combining and condensing will inevitably produce a less than complete study and will lead the user in certain directions, producing a potential for misleading." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Graphs can present internal accounting data effectively. Because One of the main functions of the accountant is to communicate accounting information to users. accountants should use graphs, at least to the extent that they clarify the presentation of accounting data. present the data fairly, and enhance management's ability to make a more informed decision. It has been argued that the human brain can absorb and understand images more easily than words and numbers, and, therefore, graphs may be better communicative devices than written reports or tabular statements." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Reliability is highly valued by accountants and has been defined as 'the faithfulness with which it (information) represents what it purports to represent'. The reason reliability is so important is that an essential characteristic of an accounting report is its acceptance, and if a report is considered to be misleading or superfluous, it and future reports will be disregarded." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Understandability implies that the graph will mean something to the audience. If the presentation has little meaning to the audience, it has little value. Understandability is the difference between data and information. Data are facts. Information is facts that mean something and make a difference to whoever receives them. Graphic presentation enhances understanding in a number of ways. Many people find that the visual comparison and contrast of information permit relationships to be grasped more easily. Relationships that had been obscure become clear and provide new insights." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The bar graph and the column graph are popular because they are simple and easy to read. These are the most versatile of the graph forms. They can be used to display time series, to display the relationship between two items, to make a comparison among several items, and to make a comparison between parts and the whole (total). They do not appear to be as 'statistical', which is an advantage to those people who have negative attitudes toward statistics. The column graph shows values over time, and the bar graph shows values at a point in time. bar graph compares different items as of a specific time (not over time)." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The scales used are important; contracting or expanding the vertical or horizontal scales will change the visual picture. The trend lines need enough grid lines to obviate difficulty in reading the results properly. One must be careful in the use of cross-hatching and shading, both of which can create illusions. Horizontal rulings tend to reduce the appearance. while vertical lines enlarge it. In summary, graphs must be reliable, and reliability depends not only on what is presented but also on how it is presented." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"There are two kinds of misrepresentation. In one. the numerical data do not agree with the data in the graph, or certain relevant data are omitted. This kind of misleading presentation. while perhaps hard to determine, clearly is wrong and can be avoided. In the second kind of misrepresentation, the meaning of the data is different to the preparer and to the user." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

28 December 2006

✏️Jason Lankow - Collected Quotes

"As with dot plots, the scale on line charts has a lot to do with how the message is conveyed. For example, using too large a scale runs the risk that viewers may gloss over a very important story in the data. However, using too small a scale might lead you to overemphasize minor fluctuations. As with dot plots, designers should plot all of the data points so that the line chart takes up two-thirds of the y-axis’s total scale." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Because bubble charts have their limitations in conveying information clearly, you shouldn’t overcomplicate them by adding too much detail, manipulating the shapes to make them into money bags, or the like. You also want to avoid using shapes that are not entirely circular (e.g., a money bag or ring with a big ol’ diamond on it). This’ll just end up looking strange. While they are good for conveying high-level differences between subcategories’ values, people also want to understand the information as well - which works best if the differences between the bubble sizes are not very great." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Bubble charts are a type of area chart that use discrete or continuous data and can be used to display nominal and ranking relationships. You would seldom use them to show only a time series or part-to-whole relationship. Bubble charts can be used to compare subcategories’ values, in either side-by-side comparisons, or in more elaborate graph types such as bubble plots (when showing ranking and time series) and bubble maps (if geography was germane to the story being told). They are most valuable when the range of data set is large, and there is a good amount of variance between the smallest and the largest subcategories. They can also be useful when using bar charts simply looks awkward." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Bubble charts are botched quite frequently. It’s important to note that the total area (not the radius) of each bubble chart represents a subcategory’s quantitative value. How bad can bubble charts look if you use radius to scale instead of total area? If a designer is trying to use bubble charts to show the difference between two quantitative values - say 2 and 4 - the area of the latter should be twice as large as the former. But if they are basing scale on radius, the graph will be designed in a way that distorts the data. The differences become more pronounced with more difference between the values." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Color is a unique tool that you should use with care. Bold colors imply emphasis on a notable item, and when colors are used everywhere, it is difficult for people to determine where to direct their attention. When everything is highlighted, nothing is highlighted. Use this power to highlight sparingly on each slide, to point the viewer to the main thrust of your messaging." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Dashboarding is an area that has been using information design to communicate key business metrics for decades. In terms of purpose, these interfaces have embodied many of the best practices surrounding visual communication. Yet their aesthetic and creative value is often lacking. This is an area of great opportunity in business communication; while well intentioned, the dashboard’s traditional format and appearance could benefit from a bit of a makeover." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"[...] explorative infographics provide information in an unbiased fashion, enabling viewers to analyze it and arrive at their own conclusions. This approach is best used for scientific and academic applications, in which comprehension of collected research or insights is a priority. Narrative infographics guide the viewers through a specific set of information that tells a predetermined story. This approach is best used when there is a need to leave readers with a specific message to take away, and should focus on audience appeal and information retention." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Good infographics also communicate something meaningful. Communicating a message worth telling provides readers with something of value. While infographics can be a powerful vehicle of communication, they are sometimes produced arbitrarily or when a cohesive and interesting story isn’t present. If the information itself is incomplete, untrustworthy, or uninteresting, attempting to create a good infographic with it is more than a fool’s errand; it’s impossible." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"If used incorrectly, decorative elements have the potential to distract the viewer from the actual information, which detracts from the graphic’s total value. Mastering this execution and finding the balance between appeal and clarity can be a nuanced process." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"One of the main benefits (and reasons for the ubiquity) of static content is the relative ease of creating a static image versus an interactive interface - especially if you want to use the infographic to cover time-sensitive material or breaking news. This efficiency also makes this content relatively affordable compared to motion and interactive content. Another key factor in the rising popularity of static infographics is their ease of shareability, as they can easily be embedded in blogs." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Stacked bars are most often used when there is a need to display multiple part-to-whole relationships. Stacked bars use discrete or continuous data, and can be oriented either vertically or horizontally. While the aggregate of each bar can be used to make nominal or ranking comparisons, this graph type is used when the composition of each bar tells an interesting story that provides the viewer with greater insight." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"The first thing you must understand is that information design is not limited to the visualization of data, in presentation design or any other application. It can and should be used to visualize other concepts such as hierarchy (org charts), anatomy (portfolio allocation), and chronology (timeline of events). Beyond the bar graphs showing sales figures and monthly projections, there are many more opportunities to explain concepts with visuals that will engage your audience and clarify your key points."  (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"The most frequent flaw in the design of a deck is likely the inconsistency among its various elements. Fonts vary; charts and graphs are borrowed from different sources; and company logos exist in varied formats, colors, and resolutions. As an agency that specializes in design, we understandably find these piecemeal creations more perturbing than the average person. However, the effect of a well-designed, polished presentation is undeniable - whether it is one that you share just within your company, or at a public speaking engagement. Of course, not every presentation occasion warrants the commission of a designer to create the deck, but we believe that many do. If the situation requires you to make a strong impression, it is essential that the various elements of your presentation fit together seamlessly. You want your audience to feel as though you have chosen your visuals as expertly as you have chosen your words." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"The order of priorities of a commercial marketing graphic would be appeal, retention, and then comprehension. Brands are looking to catch viewers’ attention and make a lasting impression - which usually means that viewers’ comprehension of content is frequently the brands’ last priority. The exception to this would be infographics that are more focused on the description of a product or service, such as a visual press release, since designers in these cases would want the viewer to clearly understand the material as it relates to the company’s value proposition. However, being appealing enough to prospective customers to get them to listen is always goal number one." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"The real value of pie charts is their usefulness in communicating big ideas quickly. However, they’re not very useful in comparing the values of the subcategories between pies (as stacked bars can be), or showing the changing makeup of a part-to-whole relationship over time. This is because it’s hard to compare the sizes of multiple pie “slices” (essentially the angles of their points next to each other) in the same pie or across multiple pies." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"What people often overlook in these debates is the most central issue to any design: the objective. While Tufte and Holmes might want to represent the exact same data set, they likely would be doing it for very different reasons. Tufte would aim to show the information in the most neutral way possible, to encourage his audience to analyze it without bias. Conversely, Holmes’s job is to editorialize the message in order to appeal to the viewer while communicating the value judgment he wants readers to take away. Tufte’s communication is explorative; that is, it encourages the viewer to explore and extract his or her own insights. Holmes’s, on the other hand, is narrative, and prescribes the intended conclusion to the viewer. The difference is inherent in their areas of work, as the objectives of science and research are much different than those of the publishing world. There’s no need to establish a universal approach to govern all objectives; rather, different individuals and industries should develop best practices unique to each application’s specific goal." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"When using dot plots to show a time series relationship, the scale does not have to start at a zero baseline. For the other relationships they do, however. For a time series relationship, the scale can be truncated if there is a story worth telling in the data that would otherwise be obscured by using a very large scale. However, you should use discretion when attempting to do this; a good rule of thumb is to use a scale in which the range of the dot plots consists of two-thirds of the graph’s total height, in order to display data trends more clearly. Additionally, if your goal is to show a time series relationship with continual data, you can throw a line on it, connecting the points. Essentially, you can use a series of straight lines between the points, which will help guide the reader’s eyes from left to right." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"While the information is of the utmost importance when it comes to soundness, what is done with the information - essentially, how it is designed - is also important. With this in mind, there are two things to consider: format and design quality. If an inappropriate format is used, the outcome will be inferior. Similarly, if the design misrepresents or skews the information deliberately or due to user error, or if the design is inappropriate given the subject matter, it cannot be considered high quality, no matter how aesthetically appealing it appears at first glance." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"[...] while the underlying data is not permanently fixed, the output - or presentation of it - is a static snapshot of the data at a specific moment in time. The advantage of this approach is that you can tell a story (for internal or external purposes) that shows the data as of a particular date or within your desired date ranges. The disadvantage is that the viewer might not necessarily be able to get access to refreshed information in real-time, and might not realize that more current information is available. A static infographic won’t be enough for large groups that require access to real-time information. If you have such a need, you will either need to build an interface that allows multiple people to process and output updated information into reports, or at least have a system for ensuring that people know how to find updated information." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

27 December 2006

✏️Karl G Karsten - Collected Quotes

"All of this information might be useful and even, for certain purposes, necessary. It is, so to speak, the statistical data of the question. But it yields no picture. A map or a globe gives us this mental picture almost in a flash. And that is precisely the use and service of a chart." (Carl Snyder, [in Karl G Karsten, "Charts and Graphs", 1925] 1923)

"A circular, like a square, area varies with the square of its linear measurements. If you make the radius of one circle twice as great as the radius of the other, the first area will be four times as great as the first. If you make the areas proportionate, the radii must be in the relation of 1 to the square root of 2. Both circle and square require the more or less tedious computation of square roots and repay this labor with inaccurate and ambiguous results." (Karl G Karsten, "Charts and Graphs", 1925)

"A curve cannot, however, always be used in the place of a bar-chart, for the line which connects the various points implies that the data itself can be considered connected. Much data can not be so considered. A careful inspection of the data will soon show whether it is connected or not, for the stubs of connected data always form a variable." (Karl G Karsten, "Charts and Graphs", 1925)

"A further detail of the 100% bar and its labelling, is the scale. This should generally be in hundredths or percents. The data may be entirely in absolute quantities, but nevertheless the scale should show percentages. To prevent the confusion of scale and divisions of the bar, the scale should be outside the bar, and the best practice seems to be to indicate the scale by little notches or short perpendicular lines dropped below the bar, from its lower edge." (Karl G Karsten, "Charts and Graphs", 1925)

"A quantity can always be illustrated by a straight line, or, as it is commonly called, a 'bar'. Bars are the simplest and often the best form of erate The total length of the line then represents the total value of the quantity. When we speak of a line in charting, we do not mean an imaginary straight line having neither width nor depth, for that would be invisible and could not, of course, be actually used in illustrations. In its place we use the bar, with a visible width (and the actual depth or thickness of a layer of ink). But it is still proper to speak of this bar as being a line or one-dimension chart, for its width and thickness are constants, necessary to give visibility to the line, and its length alone is significant." (Karl G Karsten, "Charts and Graphs", 1925)

"A series ot quantities or values can be most simply and often best shown by a series of corresponding lines or bars. All bars being drawn against one and the same scale, their lengths vary with the amounts which they represent." (Karl G Karsten, "Charts and Graphs", 1925)

"Another principle which will quickly appeal to your common sense, is the rule that when zero is real, the zero-line should be extra heavy to make it prominent. Remember that it takes the place of the floor or lower end of the bars in the bar-chart. It should stand out, therefore, in such a way that the reader can easily grasp its significance and compare with it the heights of the points on the curve. The rule is particularly important in cases where the chart extends down below the zero line into the negative side in order to show negative and positive values. On the same principle the 100% line, when it occurs in a chart, should be similarly heavy as it also may be considered a base for zero points, being the point of zero loss or gain. In fact, the rule may be extended to all cases of lines showing significant constant values, and the zero line should not be heavy, unless it has a special significance." (Karl G Karsten, "Charts and Graphs", 1925)

"Bar-charts are most flexible and can be varied to suit the individual whims of the maker. In general, however, there is one style or form which will be found most satisfactory. It consists of a horizontal grouping of bars alongside of the data. The chart is arranged in tabular form, with items or stubs in  a column to the left, with figures in a column beside the stubs and with bars in a column beside the figures. Several columns of figures are sometimes desirable, just as in the table of data, to show sources or original figures from which the charted figures are obtained. In any case, the bars should represent the most important set or column of figures, and there should be normally but one column of bars."(Karl G Karsten, "Charts and Graphs", 1925)

"Having confessed so little patience with the doctrine of the incomprehensible per se, we have naturally sought to empty the entire bag of tricks, and to tell the whole story of the chart in the simplest words that we command. Our belief has been that it is a lesser sin to be too easily understood than never understood at all. But at the same time, we have sought to make the story full and complete." (Karl G Karsten, "Charts and Graphs", 1925)

"Having prepared your data, you will next decide upon a 'scale’ or ratio of reduction to use in the drawing, that is, what value or distance on the actual floor shall be represented by each space or distance between lines on the paper. It is important to pick a scale which is neither too large nor too small, so that the drawing will be the right size on the sheet." (Karl G Karsten, "Charts and Graphs", 1925)

"In all chart-making, the material to be shown must be accurately compiled before it can be charted. For an understanding of the classification chart, we must delve somewhat into the mysteries of the various methods of classification and indexing. The art of classifying calls into play the power of visualizing a 'whole' together with all its 'parts'. Even in the most exact science, it is not always easy to break up a whole into a complete set of the distinct, mutually exclusive parts which together exactly compose it." (Karl G Karsten, "Charts and Graphs", 1925)

"In fact, it can be laid down as a general rule that both the compound and the multiple bar-charts are too elaborate and complicated. A chart is always better the simpler it is, and we should make strong efforts to simplify these charts, and if possible reduce them to simple bar-charts. It usually pays well for sacrifices we make in this way, in legibility and interest to the reader, and after all, the chart of this type 1s generally directed at a reader, rather than at the maker." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the pie-chart appears to be a two-dimension (area) chart used for one-dimension data. The fact is, however, that, as in the case of the 100% bar, the area of the chart varies directly with one dimension, the other dimension being constant. In the 100% bar the width of the bar was constant in the 100% circle the radius must be constant for all circles compared. Then the area of the segments varies directly with their arcs or angles and the chart has but one significant dimension. It is only an apparent exception to the rule." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the rule that no more dimensions or axes should be used in the chart than the data calls for, is fundamental. Violate this rule and you bring down upon your head a host of penalties. In the first place, you complicate your computing processes, or else achieve a grossly deceptive chart. If your chart becomes deceptive, it has defeated its purpose, which was to represent accurately. Unless, of course, you intended to deceive, in which case we are through with you and leave you to Mark Twain’s mercies. If you make your chart accurate, at the cost of considerable square or cube root calculating, you still have no hope, for the chart is not clear; your reader is more than likely to misunderstand it. Confusion, inaccuracy and deception always lie in wait for you down the path departing from the principle we have discussed - and one of them is sure to catch you." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the scales on which a curve is drawn can affect very much our impressions of the data by magnifying or minimizing the apparent movements of the curve itself. Of course, this does not mean that the relative height from the base-line of the various points on the curve have been altered. If you have been careful to show the base-line always, the base-line itself will approach nearer to the curve as the vertical scale is reduced and the wiggles are flattened out, and will recede farther from the curve as the vertical scale is enlarged and the wiggles are exaggerated. But it means that the oscillation or fluctuation of the curve will have been made to appear more violent or milder according as either of the scales is changed. And it therefore behooves us to give serious thought to the matter of scales before’ we determine upon them finally for any particular chart. As a matter of fact, we may have to try out several combinations of scales before we find one which gives just the right amount of emphasis to curve fluctuations to suit us." (Karl G Karsten, "Charts and Graphs", 1925)

"In the labelling of the pie-chart, you will furthermore encounter typographical difficulties. It is not ordinarily a good thing to make a reader crane his neck at various angles to read writing along every point of the compass, so you should not, as so many do, write on radii from the center of the circle. On the other hand, unless the chart and its segments are very large as compared with the size of the printing, you will introduce tricky optical illusions if you write all labels in the same directions inside the segments." (Karl G Karsten, "Charts and Graphs", 1925)

"Moreover in the pipe-organ cr vertical-bar chart, we first encounter labelling or data difficulties. And if there is one motto which we should like to print at the bottom of every page in bold-face type, as do the publishers of other valuable reference-books, it is this: 'Never separate your chart from its data'. On the contrary, incorporate the data in the chart. For a chart without its data is a poor lost thing indeed. And the unhappy reader wishing to know what it means must hunt  and hunt and hunt till he locates the particular information in some distant table. As a matter of fact, he won’t do it, for before he has found his data he has lost his interest in the matter, and then what good is your chart." (Karl G Karsten, "Charts and Graphs", 1925)

"Most of the good things in this world involve some sacrifice. Curves are no exception. In a curve the direct visible connection between the curve itself and the zero line, or x-axis, is sacrificed. As time goes on and you become more and more used to the curve chart, you will begin to think of its values as in some mysterious manner floating disembodied along the connecting line which forms the curve. You will be tempted to forget that the quantities rest very substantially upon the floor (base line, zero line, x-axis or whatever you want to call it), and that it is only their tops which reach the points plotted in the curve. And forgetting this, you will try to save space by omitting the zero line and lower part of the chart, and by showing only that small portion or band of the chart through which the plotted curve travels." (Karl G Karsten, "Charts and Graphs", 1925)

"Multiple curves are far better than multiple bar charts. A number of curves wiggling across the page at the tops of invisible bars are eminently more satisfactory than actual bars interlarded. In the first place, comparison of several series of data is greatly facilitated in curves - because each set has been condensed and simplified into a single line. There is no difficulty in comparing values of each series with each other. In the second place, such a comparison is more accurate in curves because all similar points on various sets or series have been brought together upon a single vertical line."  (Karl G Karsten, "Charts and Graphs", 1925)

"Note also, and this is important, that if through standing too close you should take a picture showing only the upper ends of the upright boards, but not their full lengths, you would consider the resulting picture not only a failure but actually deceptive. In other words, you must not omit the zero-line or base-line. While you would succeed in showing the variation of the top ends more clearly you would no longer have comparable lengths." (Karl G Karsten, "Charts and Graphs", 1925)

"Now figures are not in themselves necessarily dry and dull - in fact the figures of your bank-account may be very engrossing to you. But figures on uninteresting subjects are a sure cure for insomnia, to all of us. And it goes without saying that if the figures are not of consequence, the chart of these figures will deserve equally little attention. The point is that a chart is as weak as its own data, and a chart-maker must carefully weigh and consider his data before permitting himself the pleasure of illustrating them with a chart." (Karl G Karsten, "Charts and Graphs", 1925)

"The advantage of the pie-chart is psychological. It instantly commands the reader’s attention. A circle is, of all geometrical patterns, the easiest resting spot for the eye. The fact is well known to advertisers, who frequently use circles and circular outlines to draw attentica to their advertisements. Hence if your chart is designed for publication, or for presenta tion to readers whose attention may be easily diverted, you will find the pie-chart a powerful means for presenting your facts. Attention will be focused upon it at once, and it is as simple to understand as its name - far too simple for anyone to misunderstand. Because it is circular, there is no question but that it represents a whole and the various slices of the pie belong to their respective items."  (Karl G Karsten, "Charts and Graphs", 1925)

"The chief value of the 'pipe-organ char' [aka bar chart] as it is sometimes called, lies in the realistic picture it gives of quantities. From a base line these quantities are seen to rise the full length of the bars, as so much substantial material stacked neatly in piles where we can compare them. We view them from the ‘level or floor on which they are piled. We do not have to climb up and get a bird’s-eye view of them as in the ordinary bar-chart, where we seem to be looking down upon rows and rows of goods, but we see them from a natural view-point. Nor do we rely upon an arbitrary arrangement by which their left ends have been brought together as in the bar-chart, but we know instantly that if they are piled up, it is their tops which we must watch. The pipe-organ chart finds instant response in our minds, and appeals to us as both logical and natural. A child can comprehend it." (Karl G Karsten, "Charts and Graphs", 1925)

"The disadvantages of the pie-chart are many. It is worthless for study and research purposes. In the first place, the human eye cannot easily compare as to length the various arcs about the circle, lying as they do in different directions. In the second place, the human eye is not naturally skilled at comparing angles - those angles at the center of the circle, formed by the various rays or radii and subtending the various arcs. In the third place, the human eye is not an expert judge of comparative sizes of areas, especially those as irregular as the segments of parts of the circle. There is no way by which the parts of this round unit can be compared so accurately and quickly as the parts of a straight line or bar. Moreover, when, as frequently happens, several pie-charts are shown together, the various slices in one chart cannot be so easily compared with the corresponding slices in the next, as can the various parts of one 100% bar with corresponding parts of another bar." (Karl G Karsten, "Charts and Graphs", 1925)

"The division of a 'whole' into its 'parts' is logically one of the first steps in any analysis. Usually the graph illustrating this division belongs at the beginning of a statistical report. Thus, if your report covers the sales of the company, your first chart would break up total sales into the individual sales for each line or for each district. The remainder of the report, treating of details of the various 'parts' (e.g., lines or districts) will then follow a summary chart which has established their relative importance." (Karl G Karsten, "Charts and Graphs", 1925)

"The greatest contribution to chart-making, from any single source, is the Gantt Progress Chart. This chart is, unquestionably, the most powerful graphic device for business and for all executive and managerial purposes. While the description has been rather full, as given herein, it is by no means complete; and the Gantt charting methods, in all their co-ordinated ramifications, constitute an independent system of accounting and of executive control,in this [...]" (Karl G Karsten, "Charts and Graphs", [preface] 1925)

"The technique of bar-charts is so simple and they are so very effective, that they should be used freely in printed text-matter. No drawing or plates are needed. Printers have 'rules' as they call them, which can be used to make solid bars, and these rules can easily be set up together with the type. The scale and field can be omitted and the bars alone will effectively tell the story of the main figures in the table. The combined table and chart can be used in printed text just as well as the table alone." (Karl G Karsten, "Charts and Graphs", 1925)

"These apparently arbitrary rules of thumb are justified only so long as they serve to produce the best results. Your real purpose is to show the data most clearly and simply, either to yourself or to someone else. The chart is a window, as it were, through which the reader looks out upon an illuminating picture of the facts he is considering. Through this window he sees, if you like, a chain of mountains, whose height tells him the values or quantities he is considering. That he may see them to the best advantage, the window must be low enough for him to see the base of the mountain-range and high enough for him to see at least some sky above the highest peak. In general, the best view of the mountains would show neither too much nor too little clear sky above. And if the window is crossed with a framework for small window-panes, he can further judge of heights by the crisscross window-pane lines. Your curve is the silhouette of that mountain-range, your field the tiny window-pane outlines, and you, the chart-maker, must use your own judgment and artistic sense to place the reader’s chair near or far, high or low, in front of that window, to give him the clearest view." s it were, through which the reader looks out upon an illu- minating picture of the facts he is considering. Through this window he sees, if you like, a chain of mountains, whose height tells him the values or quantities he is considering. That he may see them to the best advantage, the window must be low enough for him to see the base of the mountain-range" (Karl G Karsten, "Charts and Graphs", 1925)

"This practice of omitting the zero line is all too common, but it is not for that reason excusable. The amputated chart is a deceptive one, tempting the average reader to compare the heights of points on the curve from the false bottom of the amputated chart-field, rather than from the true zero line, far below and invisible. A curve-chart without a zero line is in general no whit less of a printed lie, than a vertical bar-chart in which the lower part of the bars themselves are cut away. The representation of comparative sizes has been distorted and the fluctuations (changes in value) exaggerated." (Karl G Karsten, "Charts and Graphs", 1925)

"Throughout your study of charts you will find some which are more useful for popular consumption than others, but you will not find many which are more purely popular in appeal than the 100% circle or pie diagram. For analytical purposes it has nothing to recommend it, but for sensational values it is in general without an equal." (Karl G Karsten, "Charts and Graphs", 1925)

"To make a bar-chart popular, knock it over flat on its side, so that the bars stand up on end. Simple, isn’t it? But that’s the rule. There being nothing more to discuss in the matter of making popular bar-charts, we are tempted to close the dis- cussion at this point and produce a pleasant surprise to all. But the vertical bar-chart [aka column chart] is rich in suggestions for the higher forms of charts which we are approaching, and it deserves a close study." (Karl G Karsten, "Charts and Graphs", 1925)

"We have so consistently inveighed against the use of areas to illustrate quantities that the reader will indeed be surprised at some coming retractions. [...] But the fact is that we now propose to turn to advantage the very feature of areas which has previously been their greatest fault. [...] We now come to data in which we wish to show simultaneously three ratios or sets of ratios, one of which is always the product of the other two. In other words, we wish to show two factors or sets of factors and their product." (Karl G Karsten, "Charts and Graphs", 1925)

"When several curves are shown upon the same chart, it is often desirable to use different scales for them. That is, the same horizontal lines may be given two or even more different values for different curves. But even in these cases, it is better to place both scales, once and for all, at the left hand side. The practise of placing one of these scales at the right hand side, and another at the left hand side, has little to recommend it. Theoretically, at least, the left hand end of your chart is normally the y-axis itself, and the scale or ‘scales should logically be attached immediately thereto. In practice this logical position is justified." (Karl G Karsten, "Charts and Graphs", 1925)

✏️Florence Nightingale - Collected Quotes

"Diagrams are of great utility for illustrating certain questions of vital statistics by conveying ideas on the subject through the eye, which cannot be so readily grasped when contained in figures." (Florence Nightingale, "Mortality of the British Army", 1857)

"Whenever I am infuriated, I revenge myself with a new Diagram." (Florence Nightingale, [letter to Sidney Herbert] 1857)

"But law is no explanation of anything; law is simply a generalization, a category of facts. Law is neither a cause, nor a reason, nor a power, nor a coercive force. It is nothing but a general formula, a statistical table." (Florence Nightingale, "Suggestions for Thought", 1860)

"Newton's law is nothing but the statistics of gravitation, it has no power whatever. Let us get rid of the idea of power from law altogether. Call law tabulation of facts, expression of facts, or what you will; anything rather than suppose that it either explains or compels."(Florence Nightingale, "Suggestions for Thought", 1860)

"Again I must repeat my objections to intermingling causation with statistics. It might be to a certain extent admissible if you had no sanitary head. But you have one, & his report should be quite separate. The statistician has nothing to do with causation: he is almost certain in the present state of knowledge to err." (Florence Nightingale, [letter] 1861)

"All do statistics, some on paper, some by memory. Those who fail take care to give no statistics. Among those who succeed or think they have succeeded are some of small or accidental experience." (Florence Nightingale) 

"All sciences of observation depend upon statistical methods; without these [they] are blind empiricism. Make your facts comparable before deducing causes." (Florence Nightingale) 

"Only by consulting the past can the statesman judge for the future, recognize the elements necessary to realize plans, appreciate what needs reform." (Florence Nightingale

"Statistics are necessary to appreciate the effects of law." (Florence Nightingale) 

"To understand God's thoughts we must study statistics, for these are the measure of His purpose." (Florence Nightingale)  [attributed]

26 December 2006

✏️John B Peddle - Collected Quotes

"A family of chart-forms of great structural simplicity is that which is known under the general name of the 'proportional' or 'parallel alinement' type. The ease with which they may be laid out and the fact that they may be used with certain forms of equations which cannot be handled so conveniently by those types previously described are strong recom- mendations for their use in these cases." (John B Peddle, "The Construction of Graphical Charts", 1910)

"A more important case is where the divisions are laid off to a logarithmic scale. Paper ready ruled in this way may now be had from dealers in mathematical instruments and is valuable for many purposes. On it many problems which would have to be solved by tediously drawn curves, may be worked with ease by straight lines." (John B Peddle, "The Construction of Graphical Charts", 1910)

"A type of chart which has received considerable attention of late years and which differs radically from those 'already described is that known as the alinement chart. In the charts hitherto examined the necessary lines were plotted on what are known as rectangular coordinates; that is, the axes on which the values of x and y were plotted met at a right angle. This is by no means a necessary condition. The axes may be parallel [...]" (John B Peddle, "The Construction of Graphical Charts", 1910)

"Except in some of the simplest cases where the line connecting the plotted data is straight, it will generally be possible to fit a number of very different forms of equation to the same curve, none of them exactly, but all agreeing with the original about equally well. Interpolation on any of these curves will usually give results within the desired degree of accuracy. The greatest caution, however, should be observed in exterpolation, or the use of the equation outside of the limits of the observations." (John B Peddle, "The Construction of Graphical Charts", 1910)

"In fitting an equation to a given set of observations the first step is to draw through the plotted points a smooth curve. If the experimental work has been carefully and accurately done the curve may be made to pass through, or close to, almost all the points. If not, the curve must be drawn in such a way as to represent a good probable average; that is, so as to lea:ve about an equal number of points at about equal distances on either side of it, these distances, of course, being kept as small as possible. Such a curve is assumed to represent the most probable values of the observations, and we then attempt to get its equation." (John B Peddle, "The Construction of Graphical Charts", 1910)

"In getting an algebraic expression to show the relations between the components of a given set of data there may be two entirely distinct objects in view, one being to determine the physical law controlling the results and the other to get a mathematical expression, which may or may not have a physical basis, but which will enable us to calculate in a more or less accurate manner other results of a nature similar to those of the observations." (John B Peddle, "The Construction of Graphical Charts", 1910)

"The graduated lengths along the different axes may be anything we choose to make them. In general, they should be about equal and as long as possible while keeping the size of the chart within reasonable limits." (John B Peddle, "The Construction of Graphical Charts", 1910)

"The simplest form of graphical chart is that which is frequently used to compare different systems of units of the same character with each other. [...] It is exceedingly simple to construct and to use." (John B Peddle, "The Construction of Graphical Charts", 1910)

"[...] so far as I know, no systematic general method has ever been devised which will give the correct form of equation to be used. The discovery of the equation's form is to a large extent a matter of intuition which can only be acquired by long experience. Some persons seem to be peculiarly gifted in the ability to pick out the proper kind of equation for use in compensating a particularset of observations, but for the rank and file of the men engaged on experimental work this is, and probably always must be, a matter of pure guesswork, which must be verified by cut-and-try methods." (John B Peddle, "The Construction of Graphical Charts", 1910)

"Two dimensional charts for the representation of mathematical equations or experimental data are in very common use nowadays and are everywhere recognized as valuable devices for giving a clear conception of the manner in which the variables are related. Their application is generally restricted, however, to cases where there is but one variable and its function, if the variation to be shown is continuous. Nevertheless cases often arise in which there are two variables and a function to be represented and where it is desirable to show a continuousvariation for all three." (John B Peddle, "The Construction of Graphical Charts", 1910)

"When an alinement chart is intended to cover a considerable range of values we are confronted with the difficulty that it must be large, and therefore awkward to handle, or we must have scale divisions which are too small for accurate reading. These difficulties may be overcome with but little additional trouble by a system of double graduation of the axes." (John B Peddle, "The Construction of Graphical Charts", 1910) [on double axes] 

✏️Mary H Briscoe - Collected Quotes

"A good chart delineates and organizes information. It communicates complex ideas, procedures, and lists of facts by simplifying, grouping, and setting and marking priorities. By spatial organization, it should lead the eye through information smoothly and efficiently." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"A graph is a system of connections expressed by means of commonly accepted symbols. As such, the symbols and symbolic forms used in making graphs are significant. To communicate clearly this symbolism must be acknowledged." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"A slide is usually seen for less than 30 seconds, so its impact has to be immediate. For this reason, figures for slides must be especially simple and succinct. A good slide makes no more than three points, and these points augment, emphasize, and explain the speaker's words." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"An axis is the ruler that establishes regular intervals for measuring information. Because it is such a widely accepted convention, it is often taken for granted and its importance overlooked. Axes may emphasize, diminish, distort, simplify, or clutter the information. They must be used carefully and accurately." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Because 'reality' and 'truth' are essential in these figures, it is important to be straightforward and thoughtful in the selection of the areas to be used. Manipulation such as enlargement, reduction, and increase or decrease of contrast must not distort or change the information. Touch-up is permissible only to eliminate distracting artifacts. Labels should be used judiciously and sparingly, and should not hide or distract from important information." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Good ideas do not communicate themselves. Ideas must be organized. Highly complex ideas need to be clarified and simplified whereas diffuse data may benefit from being combined. Ideas and data must be made interesting and comprehensible to those not familiar with them." (Mary H Briscoe, "Preparing Scientific Illustrations:  guide to better posters, presentations, and publications" 2nd ed., 1995)

"If you have a choice of presenting your information in tables or graphs, choose the graph. A graph conveys the information more quickly and easily than a table. It also shows the information more impressively and memorably. However, if the information can be said in one or two sentences or if the absolute numerical values are necessary in the presentation, use words or tables. To emphasize essential numerical data, use a graph with the table." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Information for tables should be simplified as much as possible. Leave out data that have no bearing on the point you want to make. […] When simplifying by eliminating information, consider carefully the purpose of the table. If the intention is to summarize findings, the use of means and standard errors would be most effective. If the findings are to be compared and related, use only the pertinent data sets. Be selective about the number of data sets if documentation or facilitation of calculation is the goal. For reproduction of the experiment, two or more tables may be better than one if the information is long and complicated." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Labels should be complete but succinct. Long and complicated labels will defeat the viewer and therefore the purpose of the graph. Treat a label as a cue to jog the memory or to complete comprehension. Shorten long labels; avoid abbreviations unless they are universally understood; avoid repetition on the same graph. A title, for instance, should not repeat what is already in the axis labels. Be consistent in terminology." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Often many tracings are shown together. Extraneous parts of the tracings must be eliminated and relevant tracings should be placed in a logical order. Repetitious labels should be eliminated and labels added that will fully clarify your information." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Putting a box around items serves to isolate and emphasize them. Because a legend needs no emphasis, this is not a good idea. Do not add extra lines to a graph unless you have a good, functional reason for it. The simpler and less cluttered your graph is, the better it will communicate." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

✏️Ben Jones - Collected Quotes

"As presenters of data visualizations, often we just want our audience to understand something about their environment – a trend, a pattern, a breakdown, a way in which things have been progressing. If we ask ourselves what we want our audience to do with that information, we might have a hard time coming up with a clear answer sometimes. We might just want them to know something." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"Data is dirty. Let's just get that out there. How is it dirty? In all sorts of ways. Misspelled text values, date format problems, mismatching units, missing values, null values, incompatible geospatial coordinate formats, the list goes on and on." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"Data visualizations are either used (1) to help people complete a task, or (2) to give them a general awareness of the way things are, or (3) to enable them to explore the topic for themselves."  (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"I believe that the backlash against statistics is due to four primary reasons. The first, and easiest for most people to relate to, is that even the most basic concepts of descriptive and inferential statistics can be difficult to grasp and even harder to explain. […] The second cause for vitriol is that even well-intentioned experts misapply the tools and techniques of statistics far too often, myself included. Statistical pitfalls are numerous and tough to avoid. When we can't trust the experts to get it right, there's a temptation to throw the baby out with the bathwater. The third reason behind all the hate is that those with an agenda can easily craft statistics to lie when they communicate with us  […] And finally, the fourth cause is that often statistics can be perceived as cold and detached, and they can fail to communicate the human element of an issue." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"The first epistemic principle to embrace is that there is always a gap between our data and the real world. We fall headfirst into a pitfall when we forget that this gap exists, that our data isn't a perfect reflection of the real-world phenomena it's representing. Do people really fail to remember this? It sounds so basic. How could anyone fall into such an obvious trap?" (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"To make the best decisions in business and in life, we need to be adept at many different forms of thinking, including intuition, and we need to know how to incorporate many different types of inputs, including numerical data and statistics (analytics). Intuition and analytics don't have to be seen as mutually exclusive at all. In fact, they can be viewed as complementary." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"What is the purpose of collecting data? People gather and store data for at least three different reasons that I can discern. One reason is that they want to build an arsenal of evidence with which to prove a point or defend an agenda that they already had to begin with. This path is problematic for obvious reasons, and yet we all find ourselves traveling on it from time to time. Another reason people collect data is that they want to feed it into an artificial intelligence algorithm to automate some process or carry out some task. […] A third reason is that they might be collecting data in order to compile information to help them better understand their situation, to answer questions they have in their mind, and to unearth new questions that they didn't think to ask." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

✏️Sam Lau - Collected Quotes

"Adjusting scale is an important practice in data visualization. While the log transform is versatile, it doesn’t handle all situations where skew or curvature occurs. For example, at times the values are all roughly the same order of magnitude and the log transformation has little impact. Another transformation to consider is the square root transformation, which is often useful for count data." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"As data scientists, we create data visualizations in order to understand our data and explain our analyses to other people. A plot should have a message, and it’s our job to communicate this message as clearly as possible." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Box plots (also known as box-and-whisker plots) give a visual summary of a few important statistics of a distribution. The box denotes the 25th percentile, median, and 75th percentile, the whiskers show the tails, and unusually large or small values are also plotted. Box plots cannot reveal as much shape as a histogram or density curve. They primarily show symmetry and skew, long/short tails, and unusually large/small values (also known as outliers)." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Ignoring sampling weights can give a misleading presentation of a distribution. Whether for a histogram, bar plot, box plot, two-dimensional contour, or smooth curve, we need to use the weights to get a representative plot." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"It’s important to choose a perceptually uniform color palette. By this we mean that when a data value is doubled, the color in the visualization looks twice as colorful to the human eye. We also want to avoid colors that create an afterimage when we look from one part of the graph to another, colors of different intensities that make one attribute appear more important than another, and colors that colorblind people have trouble distinguishing between. We strongly recommend using a palette or a palette generator made specifically for data visualizations." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Many people mistakenly think that the defining property of a simple random sample is that every unit has an equal chance of being in the sample. However, this is not the case. A simple random sample of n units from a population of N means that every possible col‐lection of n of the N units has the same chance of being selected. A slight variant of this is the simple random sample with replacement, where the units/marbles are returned to the urn after each draw. This method also has the property that every sample of n units from a population of N is equally likely to be selected. The difference, though, is that there are more possible sets of n units because the same marble can appear more than once in the sample." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Researchers have studied how accurately people can read information displayed in different types of plots. They have found the following ordering, from most to leasta ccurately judged (•) Positions along a common scale, like in a rug plot, strip plot, or dot plot (•) Positions on identical, nonaligned scales, like in a bar plot (•) Length, like in a stacked bar plot (•) Angle and slope, like in a pie chart (•) Area, like in a stacked line plot or bubble chart (•) Volume and density, like in a three-dimensional bar plot (•) Color saturation and hue, like when overplotting with semitransparent points."  (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Several key assumptions enter into this urn model, such as the assumption that the vaccine is ineffective. It’s important to keep track of the reliance on these assumptions because our simulation study gives us an approximation of the rarity of an outcome like the one observed only under these key assumptions." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Shape matters because models and statistics based on symmetric distributions tend to have more robust and stable properties than highly skewed distributions" (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Side-by-side box plots offer a similar comparison of distributions across groups. The box plot offers a simpler approach that can give a crude understanding of a distribution. Likewise, violin plots sketch density curves along an axis for each group. The curve is flipped to create a symmetric 'violin' shape. The violin plot aims to bridge the gap between the density curve and box plot." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Smoothing and aggregating can help us see important features and relationships, but when we have only a handful of observations, smoothing techniques can be misleading. With just a few observations, we prefer rug plots over histograms, box plots, and density curves, and we use scatterplots rather than smooth curves and density contours. This may seem obvious, but when we have a large amount of data, the amount of data in a subgroup can quickly dwindle. This phenomenon is an example of the curse of dimensionality." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Stacked line plots are even more difficult to read because we have to judge the gap between curves as they jiggle up and down." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"The urn model is a simple abstraction that can be helpful for understanding variation.This model sets up a container (an urn, which is like a vase or a bucket) full of identical marbles that have been labeled, and we use the simple action of drawing marbles from the urn to reason about sampling schemes, randomized controlled experiments, and measurement error. For each of these types of variation, the urn model helps us estimate the size of the variation using either probability or simulation." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Through data visualization, we want to reveal important features of the data, like the shape of a distribution and the relationship between two or more features. As this example shows, after we produce an initial plot, there are still other aspects we need to consider." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"We divide accuracy into two basic parts: bias and precision (also known as variation). Our goal is for the darts to hit the bullseye on the dart‐ board and for the bullseye to line up with the unseen target. The spray of the darts on the board represents the precision in our measurements, and the gap from the bulls‐eye to the unknown value that we are targeting represents the bias." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"When interpreting a histogram or density curve, we examine the symmetry and skewness of the distribution; the number, location, and size of high-frequency regions (modes); the length of tails (often in comparison to a bell-shaped curve); gaps where no values are observed; and unusually large or anomalous values." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"When we examine relationships between qualitative features, we examine proportions of one feature within subgroups defined by another. In the previous section, the three line plots in one figure and the side-by-side bar plots both display such comparisons. With three (or more) qualitative features, we can continue to subdivide the data according to the combinations of levels of the features and compare these proportions using line plots, dot plots, side-by-side bar charts, and so forth. But these plots tend to get increasingly difficult to understand with further subdivisions." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"With qualitative data, the bar plot serves a similar role to the histogram. The bar plot gives a visual presentation of the 'popularity' or frequency of different groups. However, we cannot interpret the shape of the bar plot in the same way as a histogram. Tails and symmetry do not make sense in this setting. Also, the frequency of a category is represented by the height of the bar, and the width carries no information. The two bar charts that follow display identical information about the number of breeds in a category; the only difference is in the width of the bars. In the extreme, the rightmost plot eliminates the bars entirely and represents each count by a single dot." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

✏️Emile Cheysson - Collected Quotes

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondence or anomalies, to find the causes, to identify the laws." (Émile Cheysson, circa 1877)

"Geometric statistics compel the merchant who wishes to consult it to undertake a careful self-examination and deep investigation—steps he might not have felt necessary without this pressing summons. Indeed, this may be one of the method’s greatest benefits: it forces him to scrutinize countless factors that surround him daily yet go unnoticed, and to become aware of all the elements that, sometimes without his knowledge, influence the final outcome. It does not settle for approximations; before offering its insights, it demands to be informed with both abundance and accuracy." (Emile Cheysson, "La Statistique géométrique", 1888)

"It is this combination of observation at the foundation and geometry at the summit that I wished to express by naming this method Geometric Statistics. It cannot be subject to the usual criticisms directed at the use of pure mathematics in economic matters, which are said to be too complex to be confined within a formula." (Emile Cheysson, "La Statistique géométrique", 1888)

"It then becomes a method of graphical interpolation or extrapolation, which involves hypothetically extending a curve within or beyond the range of known data points, assuming the continuity of its pattern. In this way, one can fill in gaps in past observations and even probe the depths of the future." (Emile Cheysson, "La Statistique géométrique", 1888)

"This method is what I call Geometric Statistics. But despite its somewhat forbidding name-which I’ll explain in a moment - it is not a mathematical abstraction or a mere intellectual curiosity accessible only to a select few. It is intended, if not for all merchants and industrialists, then at least for that elite who lead the masses behind them. Practice is both its starting point and its destination. It was inspired in me more than fifteen years ago by the demands of the profession, and if I’ve decided to present it today, it’s because I’ve since verified its advantages through various applications, both in private industry and in public service." (Emile Cheysson,"La Statistique géométrique", 1888)

"Whenever it is a matter of resolving delicate questions where the solution depends on contradictory elements whose outcome is difficult to determine, Geometric Statistics has a clear role to play and can intervene usefully." (Emile Cheysson,"La Statistique géométrique", 1888)

"Graphical statistics thus possess a variety of resources that it deploys depending on the case, in order to find the most expressive and visually appealing way to depict the phenomenon. One must especially avoid trying to convey too much at once and becoming obscure by striving for completeness. Its main virtue - or one might say, its true reason for being - is clarity. If a diagram becomes so cluttered that it loses its clarity, then it is better to use the numerical table it was meant to translate." (Emile Cheysson, "Albume de statistique graphique", 1889)

"This method not only has the advantage of appealing to the senses as well as to the intellect, and of illustrating facts and laws to the eye that would be difficult to uncover in long numerical tables. It also has the privilege of escaping the obstacles that hinder the easy dissemination of scientific work - obstacles arising from the diversity of languages and systems of weights and measures among different nations. These obstacles are unknown to drawing. A diagram is not German, English, or Italian; everyone immediately grasps its relationships of scale, area, or color. Graphical statistics are thus a kind of universal language, allowing scholars from all countries to freely exchange their ideas and research, to the great benefit of science itself." (Emile Cheysson, "Albume de statistique graphique", 1889)

"Today, there is hardly any field of human activity that does not make use of graphical statistics. Indeed, it perfectly meets a dual need of our time: the demand for information that is both rapid and precise. Graphical methods fulfill these two conditions wonderfully. They allow us not only to grasp an entire series of phenomena at a glance, but also to highlight relationships or anomalies, identify causes, and extract underlying laws. They advantageously replace long tables of numbers, so that - without compromising the precision of statistics - they broaden and popularize its benefits." (Emile Cheysson, "Albume de statistique graphique", 1889)

"When a law is contained in figures, it is buried like metal in an ore; it is necessary to extract it. This is the work of graphical representation. It points out the coincidences, the relationships between phenomena, their anomalies, and we have seen what a powerful means of control it puts in the hands of the statistician to verify new data, discover and correct errors with which they have been stained." (Emile Cheysson, "Les methods de la statistique", 1890)

Sources: Bibliothéque Nationale de la France [>>

25 December 2006

✏️Daniel J Levitin - Collected Quotes

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

"Collecting data through sampling therefore becomes a never-ending battle to avoid sources of bias. [...] While trying to obtain a random sample, researchers sometimes make errors in judgment about whether every person or thing is equally likely to be sampled." (Daniel J Levitin, "Weaponized Lies", 2017)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"How do you know when a correlation indicates causation? One way is to conduct a controlled experiment. Another is to apply logic. But be careful - it’s easy to get bogged down in semantics." (Daniel J Levitin, "Weaponized Lies", 2017)

"In statistics, the word 'significant' means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not - that’s a human judgment." (Daniel J Levitin, "Weaponized Lies", 2017)

"Infographics are often used by lying weasels to shape public opinion, and they rely on the fact that most people won’t study what they’ve done too carefully." (Daniel J Levitin, "Weaponized Lies", 2017)

"Just because there’s a number on it, it doesn’t mean that the number was arrived at properly. […] There are a host of errors and biases that can enter into the collection process, and these can lead millions of people to draw the wrong conclusions. Although most of us won’t ever participate in the collection process, thinking about it, critically, is easy to learn and within the reach of all of us." (Daniel J Levitin, "Weaponized Lies", 2017)

"Many of us feel intimidated by numbers and so we blindly accept the numbers we’re handed. This can lead to bad decisions and faulty conclusions. We also have a tendency to apply critical thinking only to things we disagree with. In the current information age, pseudo-facts masquerade as facts, misinformation can be indistinguishable from true information, and numbers are often at the heart of any important claim or decision. Bad statistics are everywhere." (Daniel J Levitin, "Weaponized Lies", 2017)

"Measurements must be standardized. There must be clear, replicable, and precise procedures for collecting data so that each person who collects it does it in the same way." (Daniel J Levitin, "Weaponized Lies", 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"One kind of probability - classic probability - is based on the idea of symmetry and equal likelihood […] In the classic case, we know the parameters of the system and thus can calculate the probabilities for the events each system will generate. […] A second kind of probability arises because in daily life we often want to know something about the likelihood of other events occurring […]. In this second case, we need to estimate the parameters of the system because we don’t know what those parameters are. […] A third kind of probability differs from these first two because it’s not obtained from an experiment or a replicable event - rather, it expresses an opinion or degree of belief about how likely a particular event is to occur. This is called subjective probability […]." (Daniel J Levitin, "Weaponized Lies", 2017)

"One way to lie with statistics is to compare things - datasets, populations, types of products - that are different from one another, and pretend that they’re not. As the old idiom says, you can’t compare apples with oranges." (Daniel J Levitin, "Weaponized Lies", 2017)

"Probabilities allow us to quantify future events and are an important aid to rational decision making. Without them, we can become seduced by anecdotes and stories." (Daniel J Levitin, "Weaponized Lies", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

"The margin of error is how accurate the results are, and the confidence interval is how confident you are that your estimate falls within the margin of error." (Daniel J Levitin, "Weaponized Lies", 2017)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"We are a storytelling species, and a social species, easily swayed by the opinions of others. We have three ways to acquire information: We can discover it ourselves, we can absorb it implicitly, or we can be told it explicitly. Much of what we know about the world falls in this last category - somewhere along the line, someone told us a fact or we read about it, and so we know it only second-hand. We rely on people with expertise to tell us." (Daniel J Levitin, "Weaponized Lies", 2017)

"We use the word probability in different ways to mean different things. It’s easy to get swept away thinking that a person means one thing when they mean another, and that confusion can cause us to draw the wrong conclusion." (Daniel J Levitin, "Weaponized Lies", 2017) 

✏️Leland Wilkinson - Collected Quotes

"A grammar of graphics facilitates coordinated activity in a set of relatively autonomous components. This grammar enables us to develop a system in which adding a graphic to a frame (say, a surface) requires no adjustments or changes in definitions other than the simple message 'add this graphic'. Similarly, we can remove graphics, transform scales, permute attributes, and make other alterations without redefining the basic structure."(Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"A graph is a set of points. A mathematical graph cannot be seen. It is an abstraction. A graphic, however, is a physical representation of a graph. This representation is accomplished by realizing graphs with aesthetic attributes such as size or color." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Coordinates are sets that locate points in space. These sets are usually numbers grouped in tuples, one tuple for each point. Because spaces can be defined as sets of geometric objects plus axioms defining their behavior, coordinates can be thought of more generally as schemes for mapping elements of sets to geometric objects." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Decision-makers process priors incorrectly in several ways. First, people tend to assess probability from the representativeness of an outcome rather than from its frequency. When supporting information is added to make an outcome more coherent and congruent with a representative mental image, people tend to judge the outcome more probable, even though the added qualifications and constraints by definition make it less probable. […] Second, humans often judge relative probability of outcomes by assessing similarity rather than frequency. […] Third, when given worthless evidence in a Bayesian framework, people tend to ignore prior probabilities and use the worthless evidence." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Human decision-making in the face of uncertainty is not only prone to error, it is also biased against Bayesian principles. We are not randomly suboptimal in our decisions. We are systematically suboptimal. (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Taxonomies are useful to scientists when they lead to new theory or stimulate insights into a problem that previous theorizing might conceal. Classification for its own sake, however, is as unproductive in design as it is in science. In design, objects are only as useful as the system they support. And the test of a design is its ability to handle scenarios that include surprises, exceptions, and strategic reversals." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The consequence of distinguishing statistical methods from the graphics displaying them is to separate form from function. That is, the same statistic can be represented by different types of graphics, and the same type of graphic can be used to display two different statistics. […] This separability of statistical and geometric objects is what gives a system a wide range of representational opportunities." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The grammar of graphics takes us beyond a limited set of charts (words) to an almost unlimited world of graphical forms (statements). The rules of graphics grammar are sometimes mathematical and sometimes aesthetic. Mathematics provides symbolic tools for representing abstractions. Aesthetics, in the original Greek sense, offers principles for relating sensory attributes (color, shape, sound, etc.) to abstractions. In modern usage, aesthetics can also mean taste." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The ordinary histogram is constructed by binning data on a uniform grid. Although this is probably the most widely used statistical graphic, it is one of the more difficult ones to compute. Several problems arise, including choosing the number of bins (bars) and deciding where to place the cutpoints between bars." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The plot tells us the data are granular in the data source, something we could not ascertain with the histogram. There is an important lesson here. Statistics texts and statistical packages that recommend the histogram as the graphical starting point for a data analysis are giving bad advice. The same goes for kernel density estimates. These are appropriate second stages for graphical data analysis. The best starting point for getting a sense of the distribution of a variable is a tally, stem-and-leaf, or a dot plot. A dot plot is a special case of a tally (perhaps best thought of as a delta-neighborhood tally). Once we see that the data are not granular, we may move on to a histogram or kernel density, which smooths the data more than a dot plot." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The visual representation of a scale - an axis with ticks - looks like a ladder. Scales are the types of functions we use to map varsets to dimensions. At first glance, it would seem that constructing a scale is simply a matter of selecting a range for our numbers and intervals to mark ticks. There is more involved, however. Scales measure the contents of a frame. They determine how we perceive the size, shape, and location of graphics. Choosing a scale (even a default decimal interval scale) requires us to think about what we are measuring and the meaning of our measurements. Ultimately, that choice determines how we interpret a graphic." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"To analyze means to untangle. Even when we 'let the data speak for themselves', we need to untangle some aspect of the data before displaying things in a graphic. The more analytics we can include in the process of displaying graphics, the more flexibility our tools will have." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.