"Data Visulization is related to Information Visualization, but there are important differences. Data Visualization is for exploration, for uncovering information, as well as for presenting information. It is certainly a goal of Data Visualization to present any information in the data, but another goal is to display the raw data themselves, revealing the inherent variability and uncertainty." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Deciding on which graphics to use is often a matter of taste. What one person thinks are good graphics for illustrating information may not appeal to someone else. It may also happen that different people interpret the same graphic in quite different ways." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Histograms use area to represent counts of a distribution. This makes them somewhat related to barcharts and mosaic plots, although the number or the width of the bins of a histogram is not determined a priori and the bins are drawn without gaps between them reflecting the continuous scale of the data. Whereas barcharts and mosaic plots show the exact distribution of the sample, a histogram is always just one approximation to the distribution of the data. Sometimes histograms are also used as crude density estimators for some 'true', but usually unknown, underlying distribution for the data. There are much better density estimation methods that produce smooth distribution displays." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"How would a million be visualized today? If you have ever drawn a histogram or a scatterplot of a million cases, you know that it is possible, but that there are problems. The screen resolution of a computer cannot be high enough to show very small bars in the histogram, and in regions of high density the scatterplots look like black blobs with huge numbers of points piled on top of one another. (It is noteworthy - and useful - that the weaknesses of the two kinds of plot arise at opposite extremes of the distributional densities.) So what should be visualized? If the distributional form of the bulk of the data is of interest, then the histogram will be fine for one-dimensional views (and it may give some information about outliers too). If individual outliers are of interest, then the scatterplot will be pretty good (and it will give a fair bit of distributional information as well). One aim might be described as global, attempting to summarise the main structure, and the other as local, attempting to identify individual features. Ideally, both kinds of plot are needed to satisfy both aims." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Like parallel coordinates, networks are drawn with many lines, and so an increase in magnitude has a more dramatic effect on networks than it does on point or area plots. The main issue is not drawing optimal layouts but drawing informative and acceptable layouts fast enough to be useful. In particular, this chapter makes clear that having to analyze applications with a million nodes is not at all unusual. With trees, the task is different again. Large datasets do not lead to specially large trees, but complex datasets may lead to many, many trees, and the visualization here concentrates on the task of combining and summarizing the information from large numbers of trees. A broad range of innovative displays is introduced for these specialist tasks, though they all have their origins in existing plots." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Many different words can be used to describe graphic representations of data, but the overall aim is always to visualize the information in the data and so the term Data Visualization is the best universal term. Other terms have different connotations." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Mosaic plots […] are designed to show the dependencies and interactions between multiple categorical variables in one plot. […] . A spineplot can be regarded as a kind of one-dimensional mosaic plot. […] In contrast with a barchart, where the bars are aligned to an axis, the mosaic plot uses a rectangular region, which is subdivided into tiles according to the numbers of observations falling into the different classes. This subdivision is done recursively, or in statistical terms conditionally, as more variables are included." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Statistics has its own basic suite of domain-specific visualization tools. These statistical graphics can best be classified by the kind of data that they depict. Statistical data are usually characterized by their scale: nominal, ordinal (which are both categorical) or numerical (which is usually regarded as continuous). What is most important in distinguishing statistical graphics from other graphics is their universality: statistical graphics are not tailored towards only one specific application but are valid for any data measured on the appropriate scales." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Tables are fine for viewing sections of a dataset, but simple scrolling is no longer a practical navigational option." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"There are plenty of graphical displays that work well for small datasets and that can be found in the commonly available software packages, but they do not automatically scale up. Dotplots, scatterplots, and parallel coordinate plots all suffer from overplotting with large datasets; just think of drawing a scatterplot of a million points." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"The days of trawling through endless volumes of frequency tables for every variable and of contingency tables for every pair of variables are still sadly with us. Automatic filtering and storing of results are essential first steps to help analysts to concentrate on the important issues that require human input to interpret the result." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"The recursive construction of a mosaic plot means that the only limit for the number of variables included is the number of tiles to display, i.e. the number of possible combinations of the variables. […] If interactive queries are not available, the following strategy has proved to be helpful. Variables with only few categories should be put in the plot first, to keep the number of conditioned groups small. If one of the variables in the plot is a binary response, showing this variable via highlighting will reduce the number of tiles by half. Note that the gaps between the tiles are not part of the rectangular region that is used to build the tiles. The gaps are there to improve visual discrimination." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"The simplest way to plot univariate continuous data is a dotplot. Because the points are distributed along only one axis, overplotting is a serious problem, no matter how small the sample is. The usual technique to avoid overplotting is jittering, i.e., the data are randomly spread along a virtual second axis." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)
"Clearly principles and guidelines for good presentation graphics have a role to play in exploratory graphics, but personal taste and individual working style also play important roles. The same data may be presented in many alternative ways, and taste and customs differ as to what is regarded as a good presentation graphic. Nevertheless, there are principles that should be respected and guidelines that are generally worth following. No one should expect a perfect consensus where graphics are concerned." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)
"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008)
"For a given dataset there is not a great deal of advice which can be given on content and context. hose who know their own data should know best for their specific purposes. It is advisable to think hard about what should be shown and to check with others if the graphic makes the desired impression. Design should be let to designers, though some basic guidelines should be followed: consistency is important (sets of graphics should be in similar style and use equivalent scaling); proximity is helpful (place graphics on the same page, or on the facing page, of any text that refers to them); and layout should be checked (graphics should be neither too small nor too large and be attractively positioned relative to the whole page or display)." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)
"There are two main reasons for using graphic displays of datasets: either to present or to explore data. Presenting data involves deciding what information you want to convey and drawing a display appropriate for the content and for the intended audience. [...] Exploring data is a much more individual matter, using graphics to find information and to generate ideas.Many displays may be drawn. They can be changed at will or discarded and new versions prepared, so generally no one plot is especially important, and they all have a short life span." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)
"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)
"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)
"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)
"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)
"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)