SQL Troubles

06 December 2006

✏️Jennifer George-Palilonis - Collected Quotes

"[…] a graphic with loose, incomplete information that is too verbose, vague or passive can actually impede your audience’s ability to make sense of the information at hand. If the graphic confuses or frustrates the audience, you’re likely to do more harm than good, leave them with more questions than answers and essentially turn them away from your publication." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Actually composing an information graphic - putting all of the pieces together in a rhythmic, orderly, interesting design - is equal in importance to writing the text and creating the main illustrations. In fact, the design of the graphic can have a direct impact on an audience’s ability to follow the information that is presented in an efficient and logical manner. Design can also affect the level of meaning and understanding an audience will take away from the graphic. Thus, understanding how to compose/design an information graphic is paramount to a graphics reporter’s ability to succeed." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Although as a graphics reporter you may not find yourself in ethical dilemmas as regularly as other journalists, there are some common scenarios that pop up from time to time. The first of these is the tendency to be faced with incomplete data and the temptation to “fill in the blanks” in order to complete your graphic. When information is incomplete or seems to be misleading, you must make every effort to find the missing links through more research and fact-finding. Often, you can consult the original source(s) of the data and, by asking a few more questions, fill in the missing pieces of the puzzle. If this doesn’t work, there are often ways to present the information you do have in a way that provides the reader with a bit more detail, while at the same time, makes it clear that there, in fact, are some missing numbers." (Jennifer George-Palilonis,"A Practical Guide to Graphics Reporting", 2006)

"An infographic’s headline should summarize the main point of the presentation. Any introductory text or 'chatter' should explain the most newsworthy information within the context of the visual story being told; i.e., is the what of the story most important? Is the how of the story most important?, etc." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Believe it or not, it’s easy to make statistics lie. It’s called massaging the facts, and people do it all the time. […] To avoid this, graphics reporters should develop a keen eye for spotting problems with statistics in order to avoid the embarrassment and possible liability of reporting incorrect information." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Graphics should be planned, written and developed to stand alone. Even when a graphic is accompanied by a story, we can’t always count on the reader to get that far. Scanning readers often don’t engage with stories at all. Rather, they browse the page, often reading only display type and visual elements. And, even those who intend to read the story often engage with the graphics first because they tend to be more eye-catching. In both cases, you simply can’t create a graphic that isn’t complete without the story. Readers should finish an information graphic feeling confident that they understand the information it presents. This isn’t to say that you must tell the entire story with the graphic. However, the portions of the story that are represented in the graphic must be complete and clear." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Just as rhythm in music can move you to dance, sway or tap your foot, visual rhythm is the combination and arrangement of elements that moves your eyes through a graphic presentation. Visual rhythm can be achieved by repeating patterns that are similar in size, shape or color, by alternating elements that contrast one another in some way or by placing elements in a manner that creates progression, such as small to large or light to dark." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Look for comparisons, dates or other organizational facts outlined in the story. Who are the key players, and why? What are the key dates? How did we get here? Where do we go from here? What’s at issue, and what does it mean for the reader? These types of questions often lead to discovering graphics potential for a story, and by presenting the answers in a graphic manner, you provide readers with a quickly accessible and easily understood context for the rest of the story." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Make use of a simple data metaphor. Regardless of the concept you are trying to convey with an information graphic, you must make sure that the visual metaphor (i.e., a circle to represent a whole, as with a pie chart) be clear and logical. Don’t get so caught up in being clever that you make illogical comparisons or use unclear metaphors. In other words, don’t make your readers have to think too hard to get the point. They’ll appreciate you for it!" (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Proportion is important to information graphics because it helps create a sense of hierarchy and order among the elements. […] Proportion is also achieved by incorporating elements of varying sizes or shapes in a layout. This practice allows us to compare them to one another and make visual judgments about their relative sizes and shapes or proportion. Adhering to proportional size and shape relationships will result in a more interesting overall visual effect than if all elements are more or less the same size. Proportion is also useful in contributing to a sense of depth." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"[…] rhythm can be achieved in a variety of different ways. Asymmetrical balance is most commonly used in the design of graphics because it is the most effective way to move the eye around a graphic. Repetition in the placement of like elements or even the same element can also establish rhythm in a graphic. The similarity of the elements makes a visual connection for the eye and moves it from one to the next. Chronological, numerical or alphabetic placement of elements is also a simple way to create rhythm. This placement creates an obvious order for the eye to follow. Finally, integrating visual elements that are directional in nature often helps lead the eye in a specific direction. This could be something as simple as the use of an arrow in a design." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Specific numbers, visual descriptions of objects or events and identifiable locations don’t always jump out, and a graphic may not always present itself right away. A good graphics reporter will often discover graphics potential in less obvious ways. Is the explanation in a story getting bogged down and hard to follow? If so, can the information be organized differently? Perhaps in a more graphic manner? Is there information that hat can be conveyed conceptually to put a thought or idea into a more visual perspective? Visual metaphors (or 'data metaphors' in the case of mathematical or quantifiable information) often make it easier for people to digest information." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Text should provide the information and context that visuals cannot. By their nature, visuals can be ambiguous; well-written sentences are not. Infographics - whether statistical, cartographic or diagrammatic - are meant to demonstrate data visually and holistically. So the visuals in an infographic should do as much explanatory 'lifting' as possible, allowing words only to qualify, specify, summarize and organize." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

05 December 2006

✏️Dennis K Lieu - Collected Quotes

"Being a good team member takes work. Most people are used to working on their own - making decisions, prioritizing tasks, and being accountable for their own work. Working with others requires a different approach than working alone. To be a successful part of a team, you need to consider several issues. You should be prepared not to be in charge of everything. For some people, this requires a great deal of effort; for other people, it is less taxing. At times, you will be the supervisor; other times you will be supervised. You need to be flexible and understand that a team consisting only of leaders (or only of followers) is not likely to perform well." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Charts are used to represent quantitative data in a graphic format. A chart visually illustrates relationships between numbers. When creating a chart, keep in mind that the goal is to represent the data in a simplified and appealing way so as not to muddle the message the chart is meant to convey." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Design is a goal-oriented, problem-solving activity that typically takes many iterations - teams rarely come up with the 'optimal' design the first time around. [...] With each model, improvements were made to the original design such that the minivans of today are much improved compared to the initial product. The key activity in the design process is the development and testing of a descriptive model of the finished product before the product is finally manufactured or constructed." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Designers are responsible for the project’s fit and finish, that is, specifying the geometry and sizes of components so they properly mate with each other and are ergonomically and aesthetically acceptable within the operating environment." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Information graphics are an essential component of technical communication. Very few technical documents or presentations can be considered complete without graphical elements to present some essential data. Because engineers are visually oriented, graphic aids allow their thoughts and ideas to be better understood by other engineers. Information graphics are essential in presenting data because they simplify the content, offer a visually pleasing alternative to gray text in a proposal or an article, and thereby invite interest." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Most importantly, prepare to learn how to be a team member. Share your strengths with the team and be willing to contribute. Remember, the combined efforts of all team members should yield a better outcome than the efforts of one individual. Learn new team skills and be adaptable." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Reverse engineering is a systematic methodology for analyzing the design of an existing device or system, either as an approach to study the design or as a prerequisite for redesign. Reverse engineering essentially is a process used to gain information about the functionality and sizes of existing design components. [...] Reverse engineering is a technique within the practice of engineering design that can be useful in several ways. Reverse engineering can save time because there is no need to 'reinvent the wheel' when you can start from existing geometric data. The reverse engineering technique also can help an engineer develop a systematic approach to thinking about and improving the design of devices and systems." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations: (1) to structure data so the reader can easily pick out the information desired, (2) to display in a chart when the data contains too many variables or values, and (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"The data [in tables] should not be so spaced out that it is difficult to follow or so cramped that it looks trapped. Keep columns close together; do not spread them out more than is necessary. If the columns must be spread out to fit a particular area, such as the width of a page, use a graphic device such as a line or screen to guide the reader’s eye across the row." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Whereas charts generally focus on a trend or comparison, tables organize data for the reader to scan. Tables present data in an easy-read-format, or matrix. Tables arrange data in columns or rows so readers can make side-by-side comparisons. Tables work for many situations because they convey large amounts of data and have several variables for each item. Tables allow the reader to focus quickly on a specific item by scanning the matrix or to compare multiple items by scanning the rows or columns." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

✏️Carl T Bergstrom - Collected Quotes

"[...] although numbers may seem to be pure facts that exist independently from any human judgment, they are heavily laden with context and shaped by decisions - from how they are calculated to the units in which they are expressed." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Confirmation bias is the tendency to notice, believe, and share information that is consistent with our preexisting beliefs. When a claim confirms our beliefs about the world, we are more prone to accept it as true and less inclined to challenge it as possibly false." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Correlation doesn't imply causation - but apparently it doesn't sell newspapers either."(Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"For numbers to be transparent, they must be placed in an appropriate context. Numbers must presented in a way that allows for fair comparisons." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"If you study one group and assume that your results apply to other groups, this is extrapolation. If you think you are studying one group, but do not manage to obtain a representative sample of that group, this is a different problem. It is a problem so important in statistics that it has a special name: selection bias. Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Jargon may facilitate technical communication within a field, but it also serves to exclude those who have not been initiated into the inner circle of a discipline." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Machines are not free of human biases; they perpetuate them, depending on the data they’re fed. [...] When we train machines to make decisions based on data that arise in a biased society, the machines learn and perpetuate those same biases." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Mathiness refers to formulas and expressions that may look and feel like math-even as they disregard the logical coherence and formal rigor of actual mathematics. […] These equations make mathematical claims that cannot be supported by positing formal relationships - variables interacting multiplicatively or additively, for example - between ill-defined and impossible-to-measure quantities. In other words, mathiness, like truthiness and like bullshit, involves a disregard for logic or factual accuracy." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Numbers are ideal vehicles for promulgating bullshit. They feel objective, but are easily manipulated to tell whatever story one desires. Words are clearly constructs of human minds, but numbers? Numbers seem to come directly from Nature herself. We know words are subjective. We know they are used to bend and blur the truth. Words suggest intuition, feeling, and expressivity. But not numbers. Numbers suggest precision and imply a scientific approach. Numbers appear to have an existence separate from the humans reporting them." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"People do care about how they are measured. What can we do about this? If you are in the position to measure something, think about whether measuring it will change people’s behaviors in ways that undermine the value of your results. If you are looking at quantitative indicators that others have compiled, ask yourself: Are these numbers measuring what they are intended to measure? Or are people gaming the system and rendering this measure useless?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Reporting numbers as percentages can obscure important changes in net values. […] Percentage calculations can give strange answers when any of the numbers involved are negative." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"So what does it mean to tell an honest story? Numbers should be presented in ways that allow meaningful comparisons." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"There are many ways for error to creep into facts and figures that seem entirely straightforward. Quantities can be miscounted. Small samples can fail to accurately reflect the properties of the whole population. Procedures used to infer quantities from other information can be faulty. And then, of course, numbers can be total bullshit, fabricated out of whole cloth in an effort to confer credibility on an otherwise flimsy argument. We need to keep all of these things in mind when we look at quantitative claims. They say the data never lie - but we need to remember that the data often mislead." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"To tell an honest story, it is not enough for numbers to be correct. They need to be placed in an appropriate context so that a reader or listener can properly interpret them." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"We all know that the numerical values on each side of an equation have to be the same. The key to dimensional analysis is that the units have to be the same as well. This provides a convenient way to keep careful track of units when making calculations in engineering and other quantitative disciplines, to make sure one is computing what one thinks one is computing. When an equation exists only for the sake of mathiness, dimensional analysis often makes no sense." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Well-designed data graphics provide readers with deeper and more nuanced perspectives, while promoting the use of quantitative information in understanding the world and making decisions." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Without knowing the source and context, a particular statistic is worth little. Yet numbers and statistics appear rigorous and reliable simply by virtue of being quantitative, and have a tendency to spread." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

✏️Nancy Organ - Collected Quotes

"A line graph looks similar to a scatterplot, but each point is connected to form a wiggly line that runs from left to right. The values on the x-axis are either ordinal or numerical data that tell us the order of each data point. The connections between each point make it easier to see how much the values on the y-axis change from one point to the next. Because line charts show data in a particular order, a line in a line chart can only have one point for each value on the x-axis." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"[...] a sunburst chart where the center is either a pie chart or a donut chart of the biggest categories surrounded in donuts that show each of the other levels. The outside donut has the leaf nodes [...]" (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Another way to make points visible on a crowded visualization is to change the opacity of the points. This makes it easier to see where the points overlap. Opacity is a way of describing how hard it is to see though something. If it’s hard to see through, then it’s opaque or has a high opacity. Transparency is the opposite: if something is easy to see through, you can say that it is transparent." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Fuel gauges are another common place to see data shown with angles. Depending on the direction that the needle points and how slanted it is, we can decide if it’s time to stop at the gas station. [...] Many kinds of meters, gauges, dials, knobs, and faucets tell us what’s happening by the angle of a needle, marker, or handle." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"[...] meter charts are a third type of visualization that uses angles. Meter charts, which are sometimes called gauge charts, are named after things like electric meters and gas gauges. These visualizations are shaped like donut charts with a bite taken out. They’re mostly used for showing progress toward a goal, or how empty, full, or extreme something is." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Meter charts use angle and sometimes color to show amounts of something or progress toward a goal." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Networks or network graphs show relationships between things using nodes and links. Nodes are similar to the points on a scatterplot - they show one data point each. Links are the lines or arrows that show how the nodes connect or relate to each other." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Pie charts, donut charts, and meter charts are really just stacked bar charts that have been bent - but remember that they should always add up to 100%. Radar charts use angle to show categories and position to show amounts. You can also use angle with position to create charts that show movement, direction, or change - on maps and on graphs with number axes, as well as on visualizations with category axes." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Scatterplots and bubble charts are useful for showing the relationship between variables, but they aren’t very useful for showing the ordering of data points. If you want to understand the change from point-to-point in a certain order, you’ll need to use a different type of visualization, like a line graph." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Tree maps show networks by arranging rectangular branches and leaf nodes into a big block. Each branch in a tree map is packed with leaf nodes of the same color. Sometimes, the leaf nodes are in different sizes to show different amounts." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Visualizations that use different lengths of rectangles to show quantities are called bar charts. The rectangles in bar charts are called bars, and each bar represents a single category from a categorical variable. [...] When the bars in a bar chart are standing up, these visualizations are sometimes called column charts. Column charts and bar charts work in exactly the same way, but you might choose one over the other to fit better on a page or because it suits the data better." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

✏️John Hoffmann - Collected Quotes

"A useful way to think about tables and graphics is to visualize layers. Just as photographic files may be manipulated in photo editing software using layers, data presentations are constructed by imagining that layers of an image are placed one on top of another. There are three general layers that apply to visual data presentations: (a) a frame that is typically a rectangle or matrix, (b) axes and coordinate systems (for graphics), and (c) data presented as numbers or geometric objects." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Also known as line charts or line plots, this type of graphic displays a series of data points using line segments. […] Do not include too many lines, especially if they are difficult to distinguish. […] it is best to label the lines directly rather than use a legend. […] It is not a good idea to use line graphs with unordered categorical (nominal) data These graphs are simpler to understand when the data are ordered in some way. […] Visual acuity is enhanced when the lines do not touch the x- or y-axis […] There is no need, except under exceptional circumstances, to include a marker to show at what point the line matches a specific value of the x- and y-axes. Line graphs are designed to display patterns and trends rather than data points." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Clarity is related to two other principles of good data presentation: precision and efficiency. Precision refers to ensuring that the data are presented accurately with minimal error. This is a topic that is equally important to data presentation as it is to data management. Always keep in mind: don’t mislead the audience. As already mentioned, people can be fooled by visual images, but they can also be misled by the myth of the infallible graphic. This refers to a tendency to believe there is an important association among concepts simply because they are correlated." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Contrasts can be a help or a hindrance. Our eyes are drawn to bright colors on muted backgrounds. In addition, warm colors, such as red, are more likely to get attention than cool colors (although the relative brightness affects this phenomenon). Objects in color that are included in black and white or grayscale visuals are quite effective at drawing the eye. Thus, using color to highlight certain parts of a graphic or table can be valuable. However, avoid using these strategies if they will draw attention to extraneous or trivial parts of the data presentation." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"If colors are used for different bars in a graphic, use distinguishable shades of the same color rather than distinct colors. If lines are in color in a graph, use those that are easy to discriminate, such as red and blue. But be careful of lines that cross since a red line is perceived as in front of a blue line. If colors are employed in a table, used them to highlight the relevant comparisons you wish to make. […] Use colors to highlight important parts of the graphic. […] But be careful because this practice is easily abused." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"It is generally a good idea to avoid gridlines, vertical lines, and double lines. Use single horizontal lines to separate the title, headers, and content. Lines are also employed to identify column spanners, which are used to group particular columns of data." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Many data presentations spice up the image with background images, embedded visuals, ornate typeface, and bright colors. Our eyes may be drawn to these aspects, rather than to the patterns in the data, thus breaking the principles of clarity and efficiency. It is usually best to take out the clutter: remove the chartjunk." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"People tend to comprehend visual images quicker and with fewer errors than words on a page. Visual images also activate memories better than words." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Reference tables show a lot of data with a high degree of precision. They are designed generally to provide users with a way to fi nd particular pieces of data. […] Summary tables provide some type of extraction of data from a reference table or a spreadsheet. The data are usually manipulated, analyzed, or summarized in some way, such as by sorting or providing summary statistics (means, percentages, ranges). The results of statistical models are usually presented in research reports using this type of table." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Some experts argue that axes - in particular, the y-axis - should always begin at zero. However, when differences are small, yet the size of the numbers is relatively large, this can make detection difficult. On the other hand, viewers can be misled by manipulating the axes to magnify differences. One guideline is to always use a zero bottom point when judging absolute magnitudes. This is often the case in bar charts." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Titles should clearly specify the content of the table or the graphic. What is being presented? Means and standard deviations? Confidence intervals? Percentages? Trends over time? Furthermore, consider the context, such as when and where the data were gathered, as well as the name of the dataset if using secondary data (although the dataset may also be identified in a source note)." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Whichever scale is used to represent the data, it is important to keep it consistent in data presentations. The principles of clarity, precision, and efficiency are rarely met if the measurement scales change within tables." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

✏️Tamara Munzner- Collected Quotes

"A fundamental principle of design is to consider multiple alternatives and then choose the best, rather than to immediately fixate on one solution without considering any alternatives. One way to ensure that more than one possibility is considered is to explicitly generate multiple ideas in parallel. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"As with all design problems, vis design cannot be easily handled as a simple process of optimization because trade-offs abound. A design that does well by one measure will rate poorly on another. The characterization of trade-offs in the vis design space is a very open problem at the frontier of vis research." (Tamara Munzner, "Visualization Analysis and Design", 2014)

"Developing a clear understanding of the requirements of a particular target audience is a tricky problem for a designer. While it might seem obvious to you that it would be a good idea to understand requirements, it’s a common pitfall for designers to cut corners by making assumptions rather than actually engaging with any target users. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"Interactivity is crucial for building vis tools that handle complexity. When datasets are large enough, the limitations of both people and displays preclude just showing everything at once; interaction where user actions cause the view to change is the way forward. Moreover, a single static view can show only one aspect of a dataset. For some combinations of simple datasets and tasks, the user may only need to see a single visual encoding. In contrast, an interactively changing display supports many possible queries. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"Statistical characterization of datasets is a very powerful approach, but it has the intrinsic limitation of losing information through summarization. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The effectiveness principle dictates that the importance of the attribute should match the salience of the channel; that is, its noticeability. In other words, the most important attributes should be encoded with the most effective channels in order to be most noticeable, and then decreasingly important attributes can be matched with less effective channels. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The expressiveness principle dictates that the visual encoding should express all of, and only, the information in the dataset attributes. The most fundamental expression of this principle is that ordered data should be shown in a way that our perceptual system intrinsically senses as ordered. Conversely, unordered data should not be shown in a way that perceptually implies an ordering that does not exist. Violating this principle is a common beginner’s mistake in vis. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The idiom of heatmaps is one of the simplest uses of the matrix alignment: each cell is fully occupied by an area mark encoding a single quantitative value attribute with color. […] The benefit of heatmaps is that visually encoding quantitative data with color using small area marks is very compact, so they are good for providing overviews with high information density. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The idiom of parallel coordinates is an approach for visualizing many quantitative attributes at once using spatial position. As the name suggests, the axes are placed parallel to each other, rather than perpendicularly at right angles. While an item is shown with a dot in a scatterplot, with parallel coordinates a single item is represented by a jagged line that zigzags through the parallel axes, crossing each axis exactly once at the location of the item’s value for the associated attribute. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The idiom of scatterplots encodes two quantitative value variables using both the vertical and horizontal spatial position channels, and the mark type is necessarily a point. Scatterplots are effective for the abstract tasks of providing overviews and characterizing distributions, and specifically for finding outliers and extreme values. Scatterplots are also highly effective for the abstract task of judging the correlation between two attributes. With this visual encoding, that task corresponds the easy perceptual judgement of noticing whether the points form a line along the diagonal. The stronger the correlation, the closer the points fall along a perfect diagonal line; positive correlation is an upward slope, and negative is downward." (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The most powerful depth cue is occlusion, where some objects can not be seen because they are hidden behind others. The visible objects are interpreted as being closer than the occluded ones. The occlusion relationships between objects change as we move around; this motion parallax allows us to build up an understanding of the relative distances between objects in the world. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"The phenomenon of change blindness is that we fail to notice even quite drastic changes if our attention is directed elsewhere. […] Although we are very sensitive to changes at the focus of our attention, we are surprisingly blind to changes when our attention is not engaged. The difficulty of tracking complex and widespread changes across multiframe animations is one of the implications of change blindness for vis. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"Three high-level targets are very broadly relevant, for all kinds of data: trends, outliers, and features. A trend is a high-level characterization of a pattern in the data. Simple examples of trends include increases, decreases, peaks, troughs, and plateaus. Almost inevitably, some data doesn’t fit well with that backdrop; those elements are the outliers. The exact definition of features is task dependent, meaning any particular structures of interest." (Tamara Munzner, "Visualization Analysis and Design", 2014)

✏️John M Chambers - Collected Quotes

"At the heart of probabilistic statistical analysis is the assumption that a set of data arises as a sample from a distribution in some class of probability distributions. The reasons for making distributional assumptions about data are several. First, if we can describe a set of data as a sample from a certain theoretical distribution, say a normal distribution (also called a Gaussian distribution), then we can achieve a valuable compactness of description for the data. For example, in the normal case, the data can be succinctly described by giving the mean and standard deviation and stating that the empirical (sample) distribution of the data is well approximated by the normal distribution. A second reason for distributional assumptions is that they can lead to useful statistical procedures. For example, the assumption that data are generated by normal probability distributions leads to the analysis of variance and least squares. Similarly, much of the theory and technology of reliability assumes samples from the exponential, Weibull, or gamma distribution. A third reason is that the assumptions allow us to characterize the sampling distribution of statistics computed during the analysis and thereby make inferences and probabilistic statements about unknown aspects of the underlying distribution. For example, assuming the data are a sample from a normal distribution allows us to use the t-distribution to form confidence intervals for the mean of the theoretical distribution. A fourth reason for distributional assumptions is that understanding the distribution of a set of data can sometimes shed light on the physical mechanisms involved in generating the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Frequently we can increase the informativeness of a graph by removing structure from the data once we have identified it, so that subsequent plots are free of its dominating influence and can help us see finer structure or subtler effects. This usually means (l) partitioning the data, or (2) plotting differences or ratios, or (3) fitting a model and taking the residuals as a new set of data for further study." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Generally speaking, a good display is one in which the visual impact of its components is matched to their importance in the context of the analysis. Consider the issue of overplotting." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Graphical methodology provides powerful diagnostic tools for conveying properties of the fitted regression, for assessing the adequacy of the fit, and for suggesting improvements. There is seldom any prior guarantee that a hypothesized regression model will provide a good description of the mechanism that generated the data. Standard regression models carry with them many specific assumptions about the relationship between the response and explanatory variables and about the variation in the response that is not accounted for by the explanatory variables. In many applications of regression there is a substantial amount of prior knowledge that makes the assumptions plausible; in many other applications the assumptions are made as a starting point simply to get the analysis off the ground. But whatever the amount of prior knowledge, fitting regression equations is not complete until the assumptions have been examined." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Part of the strategy of regression modelling is to improve the model until the residuals look 'structureless', or like a simple random sample. They should only contain structure that is already taken into account (such as nonconstant variance) or imposed by the fitting process itself. By plotting them against a variety of original and derived variables, we can look for systematic patterns that relate to the model's adequacy. Although we talk about graphics for use after the model is fit, if problems with the fit are discovered at this stage of the analysis, We should take corrective action and refit the equation or a modified form of it." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Plotting on power-transformed scales (either cube roots or logs) is recommended only in those cases where the distribution is very asymmetric and the reference configuration for the untransformed plot would be a straight line through the origin." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Symmetry is also important because it can simplify our thinking about the distribution of a set of data. If we can establish that the data are (approximately) symmetric, then we no longer need to describe the shapes of both the right and left halves. (We might even combine the information from the two sides and have effectively twice as much data for viewing the distributional shape.) Finally, symmetry is important because many statistical procedures are designed for, and work best on, symmetric data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The information on a plot should be relevant to the goals of the analysis. This means that in choosing graphical methods we should match the capabilities of the methods to our needs in the context of each application. [...] Scatter plots, with the views carefully selected as in draftsman's displays, casement displays, and multiwindow plots, are likely to be more informative. We must be careful, however, not to confuse what is relevant with what we expect or want to find. Often wholly unexpected phenomena constitute our most important findings." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The most important reason for portraying standard deviations is that they give us a sense of the relative variability of the points in different regions of the plot." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The quantile plot is a good general display since it is fairly easy to construct and does a good job of portraying many aspects of a distribution. Three convenient features of the plot are the following: First, in constructing it, we do not make any arbitrary choices of parameter values or cell boundaries [...] and no models for the data are fitted or assumed. Second, like a table, it is not a summary but a display of all the data. Third, on the quantile plot every point is plotted at a distinct location, even if there are duplicates in the data. The number of points that can be portrayed without overlap is limited only by the resolution of the plotting device. For a high resolution device several hundred points distinguished." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The truth is that one display is better than another if it leads to more understanding. Often a simpler display, one that tries to accomplish less at one time, succeeds in conveying more insight. In order to understand complicated or subtle structure in the data we should be prepared to look at complicated displays when necessary, but to see any particular type of structure we should use the simplest display that shows it."(John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"There are several reasons why symmetry is an important concept in data analysis. First, the most important single summary of a set of data is the location of the center, and when data meaning of 'center' is unambiguous. We can take center to mean any of the following things, since they all coincide exactly for symmetric data, and they are together for nearly symmetric data: (l) the Center Of symmetry. (2) the arithmetic average or center Of gravity, (3) the median or 50%. Furthermore, if data a single point of highest concentration instead of several (that is, they are unimodal), then we can add to the list (4) point of highest concentration. When data are far from symmetric, we may have trouble even agreeing on what we mean by center; in fact, the center may become an inappropriate summary for the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We can gain further insight into what makes good p!ots by thinking about the process of visual perception. The eye can assimilate large amounts of visual information, perceive unanticipated structure, and recognize complex patterns; however, certain kinds of patterns are more readily perceived than others. If we thoroughly understood the interaction between the brain, eye, and picture, we could organize displays to take advantage of the things that the eye and brain do best, so that the potentially most important patterns are associated with the most easily perceived visual aspects in the display." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"When some interesting structure is seen in a plot, it is an advantage to be able to relate that structure back to the original data in a clear, direct, and meaningful way. Although this seems obvious, interpretability is at once one of the most important, difficult, and controversial issues." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

04 December 2006

✏️Lawrence C Hamilton - Collected Quotes

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Comparing normal distributions reduces to comparing only means and standard deviations. If standard deviations are the same, the task even simpler: just compare means. On the other hand, means and standard deviations may be incomplete or misleading as summaries for nonnormal distributions." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Correlation and covariance are linear regression statistics. Nonlinearity and influential cases cause the same problems for correlations, and hence for principal components/factor analysis, as they do for regression. Scatterplots should be examined routinely to check for nonlinearity and outliers. Diagnostic checks become even more important with maximum-likelihood factor analysis, which makes stronger assumptions and may be less robust than principal components or principal factors." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis typically begins with straight-line models because they are simplest, not because we believe reality is inherently linear. Theory or data may suggest otherwise [...]" (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Exploratory regression methods attempt to reveal unexpected patterns, so they are ideal for a first look at the data. Unlike other regression techniques, they do not require that we specify a particular model beforehand. Thus exploratory techniques warn against mistakenly fitting a linear model when the relation is curved, a waxing curve when the relation is S-shaped, and so forth." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Principal components and factor analysis are methods for data reduction. They seek a few underlying dimensions that account for patterns of variation among the observed variables underlying dimensions imply ways to combine variables, simplifying subsequent analysis. For example, a few combined variables could replace many original variables in a regression. Advantages of this approach include more parsimonious models, improved measurement of indirectly observed concepts, new graphical displays, and the avoidance of multicollinearity." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Principal components and principal factor analysis lack a well-developed theoretical framework like that of least squares regression. They consequently provide no systematic way to test hypotheses about the number of factors to retain, the size of factor loadings, or the correlations between factors, for example. Such tests are possible using a different approach, based on maximum-likelihood estimation." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Remember that normality and symmetry are not the same thing. All normal distributions are symmetrical, but not all symmetrical distributions are normal. With water use we were able to transform the distribution to be approximately symmetrical and normal, but often symmetry is the most we can hope for. For practical purposes, symmetry (with no severe outliers) may be sufficient. Transformations are not a magic wand, however. Many distributions cannot even be made symmetrical." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Visually, skewed sample distributions have one 'longer' and one 'shorter' tail. More general terms are 'heavier' and 'lighter' tails. Tail weight reflects not only distance from the center (tail length) but also the frequency of cases at that distance (tail depth, in a histogram). Tail weight corresponds to actual weight if the sample histogram were cut out of wood and balanced like a seesaw on its median (see next section). A positively skewed distribution is heavier to the right of the median; negative skew implies the opposite." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"A well-constructed graph can show several features of the data at once. Some graphs contain as much information as the original data, and so (unlike numerical summaries) do not actually simplify the data; rather, they express it in visual form. Unexpected or unusual features, which are not obvious within numerical tables, often jump to our attention once we draw a graph. Because the strengths and weaknesses of graphical methods are opposite those of numerical summary methods, the two work best in combination." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Data analysis [...] begins with a dataset in hand. Our purpose in data analysis is to learn what we can from those data, to help us draw conclusions about our broader research questions. Our research questions determine what sort of data we need in the first place, and how we ought to go about collecting them. Unless data collection has been done carefully, even a brilliant analyst may be unable to reach valid conclusions regarding the original research questions." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Variance and its square root, the standard deviation, summarize the amount of spread around the mean, or how much a variable varies. Outliers influence these statistics too, even more than they influence the mean. On the other hand. the variance and standard deviation have important mathematical advantages that make them (together with the mean) the foundation of classical statistics. If a distribution appears reasonably symmetrical, with no extreme outliers, then the mean and standard deviation or variance are the summaries most analysts would use." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

✏️William S Cleveland - Collected Quotes

"A graphical form that involves elementary perceptual tasks that lead to more accurate judgments than another graphical form (with the same quantitative in formation) will result in better organization and increase the chances of a correct perception of patterns and behavior." (William S Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods", Journal of the American Statistical Association Vol. 79(387), 1984)

"Dot charts are suggested as replacements for bar charts. The replacements allow more effective visual decoding of the quantitative information and can be used for a wider variety of data sets." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"[...] error bars are more effectively portrayed on dot charts than on bar charts. […] On the bar chart the upper values of the intervals stand out well, but the lower values are visually deemphasized and are not as well perceived as a result of being embedded in the bars. This deemphasis does not occur on the dot chart." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"Experimentation with graphical methods for data presentation is important for improving graphical communication in science." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"For certain types of data structures, one cannot always use the most accurate elementary task, judging position along a common scale. But this is not true of the data represented in divided bar charts and pie charts; one can always represent such data along a common scale. A pie chart can always be replaced by a bar chart, thus replacing angle judgments by position judgments. […] A divided bar chart can always be replaced by a grouped bar chart; […]." (William S Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods", Journal of the American Statistical Association Vol. 79(387), 1984)

"Of course increased bias does not necessarily imply less overall accuracy. The reasoning, however, is that the mechanism leading to bias might well lead to other types of inaccuracy as well." (William S Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods", Journal of the American Statistical Association Vol. 79(387), 1984)

"One must be careful not to fall into a conceptual trap by adopting accuracy as a criterion. We are not saying that the primary purpose of a graph is to convey numbers with as many decimal places as possible. […] The power of a graph is its ability to enable one to take in the quantitative information, organize it, and see patterns and structure not readily revealed by other means of studying the data." (William S Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods", Journal of the American Statistical Association Vol. 79(387), 1984)

"The bar of a bar chart has two aspects that can be used to visually decode quantitative information-size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The full break results in a graph with two juxtaposed panels. This use of juxtaposition to provide a full scale break, with each panel having a fill frame and its own scales, shows the scale break about as forcefully as possible and discourages mental visual connections by viewers and actual connections by authors." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The logarithm is an extremely powerful and useful tool for graphical data presentation. One reason is that logarithms turn ratios into differences, and for many sets of data, it is natural to think in terms of ratios. […] Another reason for the power of logarithms is resolution. Data that are amounts or counts are often very skewed to the right; on graphs of such data, there are a few large values that take up most of the scale and the majority of the points are squashed into a small region of the scale with no resolution." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"[…] the partial scale break is a weak indicator that the reader can fail to appreciate fully; visually the graph is still a single panel that invites the viewer to see, inappropriately, patterns between the two scales. […] The partial scale break also invites authors to connect points across the break, a poor practice indeed; […]" (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"A time series is a special case of the broader dependent-independent variable category. Time is the independent variable. One important property of most time series is that for each time point of the data there is only a single value of the dependent variable; there are no repeat measurements. Furthermore, most time series are measured at equally-spaced or nearly equally-spaced points in time." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Another way to obscure data is to graph too much. It is always tempting to show everything that comes to mind on a single graph, but graphing too much can result in less being seen and understood." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Do not allow data labels in the data region to interfere with the quantitative data or to clutter the graph. […] Avoid putting notes, keys, and markers in the data region. Put keys and markers just outside the data region and put notes in the legend or in the text." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Clear vision is a vital aspect of graphs. The viewer must be able to visually disentangle the many different items that appear on a graph." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Graphs that communicate data to others often must undergo reduction and reproduction; these processes, if not done with care, can interfere with visual clarity." (William S Cleveland, "The Elements of Graphing Data", 1985)

"In part, graphing data needs to be iterative because we often do not know what to expect of the data; a graph can help discover unknown aspects of the data, and once the unknown is known, we frequently find ourselves formulating a new question about the data. Even when we understand the data and are graphing them for presentation, a graph will look different from what we had expected; our mind's eye frequently does not do a good job of predicting what our actual eyes will see." (William S Cleveland, "The Elements of Graphing Data", 1985)

"It is common for positive data to be skewed to the right: some values bunch together at the low end of the scale and others trail off to the high end with increasing gaps between the values as they get higher. Such data can cause severe resolution problems on graphs, and the common remedy is to take logarithms. Indeed, it is the frequent success of this remedy that partly accounts for the large use of logarithms in graphical data display." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Iteration and experimentation are important for all of data analysis, including graphical data display. In many cases when we make a graph it is immediately clear that some aspect is inadequate and we regraph the data. In many other cases we make a graph, and all is well, but we get an idea for studying the data in a different way with a different graph; one successful graph often suggests another." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Make the data stand out and avoid superfluity are two broad strategies that serve as an overall guide to the specific principles […] The data - the quantitative and qualitative information in the data region - are the reason for the existence of the graph. The data should stand out. […] We should eliminate superfluity in graphs. Unnecessary parts of a graph add to the clutter and increase the difficulty of making the necessary elements - the data - stand out." (William S Cleveland, "The Elements of Graphing Data", 1985)

"No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods." (William S Cleveland, "The Elements of Graphing Data", 1985)

"There are some who argue that a graph is a success only if the important information in the data can be seen within a few seconds. While there is a place for rapidly-understood graphs, it is too limiting to make speed a requirement in science and technology, where the use of graphs ranges from, detailed, in-depth data analysis to quick presentation." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Use a reference line when there is an important value that must be seen across the entire graph, but do not let the line interfere with the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"When a graph is constructed, quantitative and categorical information is encoded, chiefly through position, size, symbols, and color. When a person looks at a graph, the information is visually decoded by the person's visual system. A graphical method is successful only if the decoding process is effective. No matter how clever and how technologically impressive the encoding, it is a failure if the decoding process is a failure. Informed decisions about how to encode data can be achieved only through an understanding of the visual decoding process, which is called graphical perception." (William S Cleveland, "The Elements of Graphing Data", 1985)

"When magnitudes are graphed on a logarithmic scale, percents and factors are easier to judge since equal multiplicative factors and percents result in equal distances throughout the entire scale." (William S Cleveland, "The Elements of Graphing Data", 1985)

"When the data are magnitudes, it is helpful to have zero included in the scale so we can see its value relative to the value of the data. But the need for zero is not so compelling that we should allow its inclusion to ruin the resolution of the data on the graph." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"Fitting data means finding mathematical descriptions of structure in the data. An additive shift is a structural property of univariate data in which distributions differ only in location and not in spread or shape. […] The process of identifying a structure in data and then fitting the structure to produce residuals that have the same distribution lies at the heart of statistical analysis. Such homogeneous residuals can be pooled, which increases the power of the description of the variation in the data." (William S Cleveland, "Visualizing Data", 1993)

"Fitting is essential to visualizing hypervariate data. The structure of data in many dimensions can be exceedingly complex. The visualization of a fit to hypervariate data, by reducing the amount of noise, can often lead to more insight. The fit is a hypervariate surface, a function of three or more variables. As with bivariate and trivariate data, our fitting tools are loess and parametric fitting by least-squares. And each tool can employ bisquare iterations to produce robust estimates when outliers or other forms of leptokurtosis are present." (William S Cleveland, "Visualizing Data", 1993)

"If the underlying pattern of the data has gentle curvature with no local maxima and minima, then locally linear fitting is usually sufficient. But if there are local maxima or minima, then locally quadratic fitting typically does a better job of following the pattern of the data and maintaining local smoothness." (William S Cleveland, "Visualizing Data", 1993)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"Many of the applications of visualization in this book give the impression that data analysis consists of an orderly progression of exploratory graphs, fitting, and visualization of fits and residuals. Coherence of discussion and limited space necessitate a presentation that appears to imply this. Real life is usually quite different. There are blind alleys. There are mistaken actions. There are effects missed until the very end when some visualization saves the day. And worse, there is the possibility of the nearly unmentionable: missed effects." (William S Cleveland, "Visualizing Data", 1993)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"Sometimes, when visualization thoroughly reveals the structure of a set of data, there is a tendency to underrate the power of the method for the application. Little effort is expended in seeing the structure once the right visualization method is used, so we are mislead into thinking nothing exciting has occurred." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"The scatterplot is a useful exploratory method for providing a first look at bivariate data to see how they are distributed throughout the plane, for example, to see clusters of points, outliers, and so forth." (William S Cleveland, "Visualizing Data", 1993)

"There are two components to visualizing the structure of statistical data - graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered." (William S Cleveland, "Visualizing Data", 1993)

"Using area to encode quantitative information is a poor graphical method. Effects that can be readily perceived in other visualizations are often lost in an encoding by area." (William S Cleveland, "Visualizing Data", 1993)

"Visualization is an approach to data analysis that stresses a penetrating look at the structure of data. No other approach conveys as much information. […] Conclusions spring from data when this information is combined with the prior knowledge of the subject under investigation." (William S Cleveland, "Visualizing Data", 1993)

"Visualization is an effective framework for drawing inferences from data because its revelation of the structure of data can be readily combined with prior knowledge to draw conclusions. By contrast, because of the formalism of probabilistic methods, it is typically impossible to incorporate into them the full body of prior information." (William S Cleveland, "Visualizing Data", 1993)

"When distributions are compared, the goal is to understand how the distributions shift in going from one data set to the next. […] The most effective way to investigate the shifts of distributions is to compare corresponding quantiles." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"Pie charts have severe perceptual problems. Experiments in graphical perception have shown that compared with dot charts, they convey information far less reliably. But if you want to display some data, and perceiving the information is not so important, then a pie chart is fine." (Richard Becker & William S Cleveland," S-Plus Trellis Graphics User's Manual", 1996)

✏️Scott Berinato - Collected Quotes

"A chart that knows its context well will naturally end up looking better because it’s showing what it needs to show and nothing else. Good context begets good design. Good charts are only the means to a more profound end: presenting your ideas effectively. Good charts are not the product you’re after. They’re the way to deliver your product - insight." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"A perfectly relevant visualization that breaks a few presentation rules is far more valuable - it’s better - than a perfectly executed, beautiful chart that contains the wrong data, communicates the wrong message, or fails to engage its audience." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"[…] although the relationship between perception and correlation is linear for all types of charts, the linear rate varies between chart types." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Bad complexity neither elucidates important salient points nor shows coherent broader trends. It will obfuscate, frustrate, tax the mind, and ultimately convey trendlessness and confusion to the viewer. Good complexity, in contrast, emerges from visualizations that use more data than humans can reasonably process to form a few salient points." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"But rules are open to interpretation and sometimes arbitrary or even counterproductive when it comes to producing good visualizations. They’re for responding to context, not setting it. Instead of worrying about whether a chart is "right" or "wrong", focus on whether it’s good." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Charts used to confirm are less formal, and designed well enough to be interpreted, but they don’t always have to be presentation worthy. […] Or maybe you don’t know what you’re looking for […] This is exploratory work - rougher still in design, usually iterative, sometimes interactive. Most of us don’t do as much exploratory work as we do declarative and confirmatory; we should do more. It’s a kind of data brainstorming." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Confirmation is a kind of focused exploration, whereas true exploration is more open-ended. The bigger and more complex the data, and the less you know going in, the more exploratory the work. If confirmation is hiking a new trail, exploration is blazing one." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Dataviz has become a competitive imperative for companies. Those that don’t have a critical mass of managers capable of thinking visually will lag behind the ones that do." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Good design isn’t just choosing colors and fonts or coming up with an aesthetic for charts. That’s styling - part of design, but by no means the most important part. Rather, people with design talent develop and execute systems for effective visual communication. They understand how to create and edit visuals to focus an audience and distill ideas." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Good design serves a more important function than simply pleasing you: It helps you access ideas. It improves your comprehension and makes the ideas more persuasive. Good design makes lesser charts good and good charts transcendent." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"In general, charts that contain enough data to take minutes, not seconds, to digest will work better on paper or a personal screen, for an individual who’s not being asked to listen to a presentation while trying to take in so much information." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Keep in mind that bars, lines, and scatter plots are your workhorses. Those three forms alone will help you arrive at many good charts in most situations. While you shouldn’t shun other forms, you also don’t need to choose different ones just to be different." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"People feel data. They don’t just process statistics and come to rational conclusions. They form emotions about the data visualization. We are not informed by charts; we’re affected by them." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Sketching bridges idea and visualization. Good sketches are quick, simple, and messy. Don’t think too much about real values or scales or any refining details. In fact, don’t think too much. Just keep in mind those keywords, the possible forms they suggest, and that overarching idea you keep coming back to, the one you wrote down in answer to What am I trying to say (or learn)? And draw. Create shapes, develop a sense of what you want your audience to see. Try anything." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"To build fluency in this new language, to tap into this vehicle for professional growth, and to give your organization a competitive edge, you first need to recognize a good chart when you see one." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Unlike text, visual communication is governed less by an agreed-upon convention between 'writer' and 'reader' than by how our visual systems react to stimuli, often before we’re aware of it. And just as composers use music theory to create music that produces certain predictable effects on an audience, chart makers can use visual perception theory to make more-effective visualizations with similarly predictable effects." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Ultimately, when you create a visualization, that’s what you need to know. Is it good? Is it effective? Are you helping people see an idea and learn from it? Are you making your case?" (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Visualization is an abstraction, a way to reduce complexity […] complexity and color catch the eye; they’re captivating. They can also make it harder to extract meaning from a chart." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"We see first what stands out. Our eyes go right to change and difference - peaks, valleys, intersections, dominant colors, outliers. Many successful charts - often the ones that please us the most and are shared and talked about - exploit this inclination by showing a single salient point so clearly that we feel we understand the chart’s meaning without even trying." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"When deeply complex charts work, we find them effective and beautiful, just as we find a symphony beautiful, which is another marvelously complex arrangement of millions of data points that we experience as a coherent whole." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Without context, no one […] can say whether that chart is good. In the absence of context, a chart is neither good nor bad. It’s only well built or poorly built. To judge a chart’s value, you need to know more - much more - than whether you used the right chart type, picked good colors, or labeled axes correctly. Those things can help make charts good, but in the absence of context they’re academic considerations. It’s far more important to know Who will see this? What do they want? What do they need? What idea do I want to convey? What could I show? What should I show? Then, after all that, How will I show it?" (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Your eyes and your brain always notice more dynamic visual information first and fastest. The implicit lesson is to make the idea you want people to see stand out. Conversely, make sure you’re not helping people see something that either doesn’t help convey your idea or actively fights against it." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

✏️Antony Unwin - Collected Quotes

"Data Visulization is related to Information Visualization, but there are important differences. Data Visualization is for exploration, for uncovering information, as well as for presenting information. It is certainly a goal of Data Visualization to present any information in the data, but another goal is to display the raw data themselves, revealing the inherent variability and uncertainty." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Deciding on which graphics to use is often a matter of taste. What one person thinks are good graphics for illustrating information may not appeal to someone else. It may also happen that different people interpret the same graphic in quite different ways." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Histograms use area to represent counts of a distribution. This makes them somewhat related to barcharts and mosaic plots, although the number or the width of the bins of a histogram is not determined a priori and the bins are drawn without gaps between them reflecting the continuous scale of the data. Whereas barcharts and mosaic plots show the exact distribution of the sample, a histogram is always just one approximation to the distribution of the data. Sometimes histograms are also used as crude density estimators for some 'true', but usually unknown, underlying distribution for the data. There are much better density estimation methods that produce smooth distribution displays." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"How would a million be visualized today? If you have ever drawn a histogram or a scatterplot of a million cases, you know that it is possible, but that there are problems. The screen resolution of a computer cannot be high enough to show very small bars in the histogram, and in regions of high density the scatterplots look like black blobs with huge numbers of points piled on top of one another. (It is noteworthy - and useful - that the weaknesses of the two kinds of plot arise at opposite extremes of the distributional densities.) So what should be visualized? If the distributional form of the bulk of the data is of interest, then the histogram will be fine for one-dimensional views (and it may give some information about outliers too). If individual outliers are of interest, then the scatterplot will be pretty good (and it will give a fair bit of distributional information as well). One aim might be described as global, attempting to summarise the main structure, and the other as local, attempting to identify individual features. Ideally, both kinds of plot are needed to satisfy both aims." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Like parallel coordinates, networks are drawn with many lines, and so an increase in magnitude has a more dramatic effect on networks than it does on point or area plots. The main issue is not drawing optimal layouts but drawing informative and acceptable layouts fast enough to be useful. In particular, this chapter makes clear that having to analyze applications with a million nodes is not at all unusual. With trees, the task is different again. Large datasets do not lead to specially large trees, but complex datasets may lead to many, many trees, and the visualization here concentrates on the task of combining and summarizing the information from large numbers of trees. A broad range of innovative displays is introduced for these specialist tasks, though they all have their origins in existing plots." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Many different words can be used to describe graphic representations of data, but the overall aim is always to visualize the information in the data and so the term Data Visualization is the best universal term. Other terms have different connotations." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Mosaic plots […] are designed to show the dependencies and interactions between multiple categorical variables in one plot. […] . A spineplot can be regarded as a kind of one-dimensional mosaic plot. […] In contrast with a barchart, where the bars are aligned to an axis, the mosaic plot uses a rectangular region, which is subdivided into tiles according to the numbers of observations falling into the different classes. This subdivision is done recursively, or in statistical terms conditionally, as more variables are included." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Statistics has its own basic suite of domain-specific visualization tools. These statistical graphics can best be classified by the kind of data that they depict. Statistical data are usually characterized by their scale: nominal, ordinal (which are both categorical) or numerical (which is usually regarded as continuous). What is most important in distinguishing statistical graphics from other graphics is their universality: statistical graphics are not tailored towards only one specific application but are valid for any data measured on the appropriate scales." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Tables are fine for viewing sections of a dataset, but simple scrolling is no longer a practical navigational option." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"There are plenty of graphical displays that work well for small datasets and that can be found in the commonly available software packages, but they do not automatically scale up. Dotplots, scatterplots, and parallel coordinate plots all suffer from overplotting with large datasets; just think of drawing a scatterplot of a million points." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The days of trawling through endless volumes of frequency tables for every variable and of contingency tables for every pair of variables are still sadly with us. Automatic filtering and storing of results are essential first steps to help analysts to concentrate on the important issues that require human input to interpret the result." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The recursive construction of a mosaic plot means that the only limit for the number of variables included is the number of tiles to display, i.e. the number of possible combinations of the variables. […] If interactive queries are not available, the following strategy has proved to be helpful. Variables with only few categories should be put in the plot first, to keep the number of conditioned groups small. If one of the variables in the plot is a binary response, showing this variable via highlighting will reduce the number of tiles by half. Note that the gaps between the tiles are not part of the rectangular region that is used to build the tiles. The gaps are there to improve visual discrimination." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The simplest way to plot univariate continuous data is a dotplot. Because the points are distributed along only one axis, overplotting is a serious problem, no matter how small the sample is. The usual technique to avoid overplotting is jittering, i.e., the data are randomly spread along a virtual second axis." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Clearly principles and guidelines for good presentation graphics have a role to play in exploratory graphics, but personal taste and individual working style also play important roles. The same data may be presented in many alternative ways, and taste and customs differ as to what is regarded as a good presentation graphic. Nevertheless, there are principles that should be respected and guidelines that are generally worth following. No one should expect a perfect consensus where graphics are concerned." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008)

"For a given dataset there is not a great deal of advice which can be given on content and context. hose who know their own data should know best for their specific purposes. It is advisable to think hard about what should be shown and to check with others if the graphic makes the desired impression. Design should be let to designers, though some basic guidelines should be followed: consistency is important (sets of graphics should be in similar style and use equivalent scaling); proximity is helpful (place graphics on the same page, or on the facing page, of any text that refers to them); and layout should be checked (graphics should be neither too small nor too large and be attractively positioned relative to the whole page or display)." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"There are two main reasons for using graphic displays of datasets: either to present or to explore data. Presenting data involves deciding what information you want to convey and drawing a display appropriate for the content and for the intended audience. [...] Exploring data is a much more individual matter, using graphics to find information and to generate ideas.Many displays may be drawn. They can be changed at will or discarded and new versions prepared, so generally no one plot is especially important, and they all have a short life span." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)