10 December 2006

Forrest W Young - Collected Quotes

"A boxplot is a dotplot enhanced with a schematic that provides information about the center and spread of the data, including the median, quartiles, and so on. This is a very useful way of summarizing a variable's distribution. The dotplot can also be enhanced with a diamond-shaped schematic portraying the mean and standard deviation (or the standard error of the mean)." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A scatterplot reveals the strength and shape of the relationship between a pair of variables. A scatterplot represents the two variables by axes drawn at right angles to each other, showing the observations as a cloud of points, each point located according to its values on the two variables. Various lines can be added to the plot to help guide our search for understanding." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"A statistical hypothesis is a statement that specifies a set of possible distributions of the data variable x. In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis. Confirmatory statistics used the formalisms of mathematical proofs, theorems, derivations, and so on, to provide a firm mathematical foundation for hypothesis testing."(Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"After all, we do agree that statistical data analysis is concerned with generating and evaluating hypotheses about data. For us, generating hypotheses means that we are searching for patterns in the data - trying to 'see what the data seem to say'. And evaluating hypotheses means that we are seeking an explanation or at least a simple description of what we find - trying to verify what we believe we see." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Commonly, data do not make a clear and unambiguous statement about our world, often requiring tools and methods to provide such clarity. These methods, called statistical data analysis, involve collecting, manipulating, analyzing, interpreting, and presenting data in a form that can be used, understood, and communicated to others." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Exploring data generates hypotheses about patterns in our data. The visualizations and tools of dynamic interactive graphics ease and improve the exploration, helping us to 'see what our data seem to say'." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Histograms and frequency polygons display a schematic of a numeric variable's frequency distribution. These plots can show us the center and spread of a distribution, can be used to judge the skewness, kurtosis, and modicity of a distribution, can be used to search for outliers, and can help us make decisions about the symmetry and normality of a distribution." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Linking is a powerful dynamic interactive graphics technique that can help us better understand high-dimensional data. This technique works in the following way: When several plots are linked, selecting an observation's point in a plot will do more than highlight the observation in the plot we are interacting with - it will also highlight points in other plots with which it is linked, giving us a more complete idea of its value across all the variables. Selecting is done interactively with a pointing device. The point selected, and corresponding points in the other linked plots, are highlighted simultaneously. Thus, we can select a cluster of points in one plot and see if it corresponds to a cluster in any other plot, enabling us to investigate the high-dimensional shape and density of the cluster of points, and permitting us to investigate the structure of the disease space." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"The simplest and most common way to represent the empirical distribution of a numerical variable is by showing the individual values as dots arranged along a line. The main difficulty with this plot concerns how to treat tied values. We usually don't want to represent them by the same point, since that means that the two values look like one. What we can do is 'jitter' the points a bit (i.e., move them back and forth at right angles to the plot axis) so that all points are visible. […] In addition to permitting you to identify individual points, dotplots allow you to look into some of the distributional properties of a variable. […] Dotplots can also be good for looking for modality. " (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"The way that the model differs from the data gives us clues about how we can improve our model. We can use mosaic displays to find the specific ways in which the model is different from the data, since mosaics show the residuals (or differences) of the cells with respect to the model. Looking at these differences, we can observe patterns in the deviation that will help us in our search." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Transforming data to measurements of a different kind can clarify and simplify hypotheses that have already been generated and can reveal patterns that would otherwise be hidden." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"One of the main problems with the visual approach to statistical data analysis is that it is too easy to generate too many plots: We can easily become totally overwhelmed by the shear number and variety of graphics that we can generate. In a sense, we have been too successful in our goal of making it easy for the user: Many, many plots can be generated, so many that it becomes impossible to understand our data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.