Showing posts with label quotes. Show all posts
Showing posts with label quotes. Show all posts

07 June 2026

📉Graphical Representation: Representation (Just the Quotes)

"The advantages proposed by [the graphical] mode of representation, are to facilitate the attainment of information, and aid the memory in retaining it: which two points form the principal business in what we call learning. Of all the senses, the eye gives the liveliest and most accurate idea of whatever is susceptible of being represented to it; and when proportion between different quantities is the object, then the eye has an incalculable superiority." (William Playfair, The Statistical Breviary", 1801)

"They [diagrams] are designed not so much to allow of reference to particular numbers, which can be better had from printed tables of figures, as to exhibit to the eye the general results of large masses of figures which it is hopeless to attack in any other way than by graphical representation." (William S Jevons, [letter to Richard Hutton] 1863)

"Whereas the Eulerian plan endeavoured at once and directly to represent propositions, or relations of class terms to one another, we shall find it best to begin by representing only classes, and then proceed to modify these in some way so as to make them indicate what our propositions have to say. How, then, shall we represent all the subclasses which two or more class terms can produce? Bear in mind that what we have to indicate is the successive duplication of the number of subdivisions produced by the introduction of each successive term. and we shall see our way to a very important departure from the Eulerian conception. All that we have to do is to draw our figures, say circles, so that each successive one which we introduce shall intersect once, and once only, all the subdivisions already existing, and we then have what may be called a general framework indicating every possible combination producible by the given class terms." (John Venn, "On the Diagrammatic and Mechanical Representation of Propositions and Reasonings", 1880)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Graphic representation by means of charts depends upon the super-position of special lines or curves upon base lines drawn or ruled in a standard manner. For the economic construction of these charts as well as their correct use it is necessary that the standard rulings be correctly designed." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"To summarize - with the ordinary arithmetical scale, fluctuations in large factors are very noticeable, while relatively greater fluctuations in smaller factors are barely apparent. The logarithmic scale permits the graphic representation of changes in every quantity without respect to the magnitude of the quantity itself. At the same time, the logarithmic scale shows the actual value by reference to the numbers in the vertical scale. By indicating both absolute and relative values and changes, the logarithmic scale combines the advantages of both the natural and the percentage scale without the disadvantages of either." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"With the ordinary scale, fluctuations in large factors are very noticeable, while relatively greater fluctuations in smaller factors are barely apparent. The semi-logarithmic scale permits the graphic representation of changes in every quantity on the same basis, without respect to the magnitude of the quantity itself. At the same time, it shows the actual value by reference to the numbers in the scale column. By indicating both absolute and relative value and changes to one scale, it combines the advantages of both the natural and percentage scale, without the disadvantages of either." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"A graph is a pictorial representation or statement of a series of values all drawn to scale. It gives a mental picture of the results of statistical examination in one case while in another it enables calculations to be made by drawing straight lines or it indicates a change in quantity together with the rate of that change. A graph then is a picture representing some happenings and so designed as to bring out all points of significance in connection with those happenings. When the curve has been plotted delineating these happenings a general inspection of it shows the essential character of the table or formula from which it was derived." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"At the present time there is a total lack of standardization in the form of diagram to use for nearly all classes of representation. This makes it difficult to compare reports of different investigators on the same subject because their diagrams are not constructed alike." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"Although, the tabular arrangement is the fundamental form for presenting a statistical series, a graphic representation - in a chart or diagram - is often of great aid in the study and reporting of statistical facts. Moreover, sometimes statistical data must be taken, in their sources, from graphic rather than tabular records." (William L Crum et al, "Introduction to Economic Statistics", 1938)

"The primary purpose of a graph is to show diagrammatically how the values of one of two linked variables change with those of the other. One of the most useful applications of the graph occurs in connection with the representation of statistical data." (John F Kenney & E S Keeping, "Mathematics of Statistics" Vol. I 3rd Ed., 1954)

"A model is a qualitative or quantitative representation of a process or endeavor that shows the effects of those factors which are significant for the purposes being considered. A model may be pictorial, descriptive, qualitative, or generally approximate in nature; or it may be mathematical and quantitative in nature and reasonably precise. It is important that effective means for modeling be understood such as analog, stochastic, procedural, scheduling, flow chart, schematic, and block diagrams." (Harold Chestnut, "Systems Engineering Tools", 1965)

"To analyse graphic representation precisely, it is helpful to distinguish it from musical, verbal and mathematical notations, all of which are perceived in a linear or temporal sequence. The graphic image also differs from figurative representation essentially polysemic, and from the animated image, governed by the laws of cinematographic time. Within the boundaries of graphics fall the fields of networks, diagrams and maps. The domain of graphic imagery ranges from the depiction of atomic structures to the representation of galaxies and extends into the spheres of topography and cartography." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"One of the methods making the data intelligible is to represent it by means of graphs and diagrams. The graphic & diagrammatic representation of the data is always appealing to the eye as well as to the mind of the observer." (S P Singh & R P S Verma, "Agricultural Statistics", cca. 1969)

"Probably one of the most common misuses" (intentional or otherwise) of a graph is the choice of the wrong scale - wrong, that is, from the standpoint of accurate representation of the facts. Even though not deliberate, selection of a scale that magnifies or reduces - even distorts - the appearance of a curve can mislead the viewer." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"A graphic is an illustration that, like a painting or drawing, depicts certain images on a flat surface. The graphic depends on the use of lines and shapes or symbols to represent numbers and ideas and show comparisons, trends, and relationships. The success of the graphic depends on the extent to which this representation is transmitted in a clear and interesting manner." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Unlike some art forms. good graphics should be as concrete, geometrical, and representational as possible. A rectangle should be drawn as a rectangle, leaving nothing to the reader's imagination about what you are trying to portray. The various lines and shapes used in a graphic chart should be arranged so that it appears to be balanced. This balance is a result of the placement of shapes and lines in an orderly fashion." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The representational nature of maps, however, is often ignored - what we see when looking at a map is not the word, but an abstract representation that we find convenient to use in place of the world. When we build these abstract representations we are not revealing knowledge as much as are creating it." (Alan MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Understanding how maps work and why maps work" (or do not work) as representations in their own right and as prompts to further representations, and what it means for a map to work, are critical issues as we embark on a visual information age." (Alan MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"A Venn diagram is a simple representation of the sample space, that is often helpful in seeing 'what is going on'. Usually the sample space is represented by a rectangle, with individual regions within the rectangle representing events. It is often helpful to imagine that the actual areas of the various regions in a Venn diagram are in proportion to the corresponding probabilities. However, there is no need to spend a long time drawing these diagrams - their use is simply as a reminder of what is happening." (Graham Upton & Ian Cook, "Introducing Statistics", 2001)

"A good way to evaluate a model is to look at a visual representation of it. After all, what is easier to understand - a table full of mathematical relationships or a graphic displaying a decision tree with all of its splits and branches?" (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Good numeric representation is a key to effective thinking that is not limited to understanding risks. Natural languages show the traces of various attempts at finding a proper representation of numbers. [...] The key role of representation in thinking is often downplayed because of an ideal of rationality that dictates that whenever two statements are mathematically or logically the same, representing them in different forms should not matter. Evidence that it does matter is regarded as a sign of human irrationality. This view ignores the fact that finding a good representation is an indispensable part of problem solving and that playing with different representations is a tool of creative thinking." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Information needs representation. The idea that it is possible to communicate information in a 'pure' form is fiction. Successful risk communication requires intuitively clear representations. Playing with representations can help us not only to understand numbers" (describe phenomena) but also to draw conclusions from numbers" (make inferences). There is no single best representation, because what is needed always depends on the minds that are doing the communicating." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"Why does representing information in terms of natural frequencies rather than probabilities or percentages foster insight? For two reasons. First, computational simplicity: The representation does part of the computation. And second, evolutionary and developmental primacy: Our minds are adapted to natural frequencies." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"A road plan can show the exact location, elevation, and dimensions of any part of the structure. The map corresponds to the structure, but it's not the same as the structure. Software, on the other hand, is just a codification of the behaviors that the programmers and users want to take place. The map is the same as the structure. […] This means that software can only be described accurately at the level of individual instructions. […] A map or a blueprint for a piece of software must greatly simplify the representation in order to be comprehensible. But by doing so, it becomes inaccurate and ultimately incorrect. This is an important realization: any architecture, design, or diagram we create for software is essentially inadequate. If we represent every detail, then we're merely duplicating the software in another form, and we're wasting our time and effort." (George Stepanek, "Software Project Secrets: Why Software Projects Fail", 2005)

"Graphs are pictorial representations of numerical quantities. It therefore seems reasonable to expect that the visual impression we get when looking at a graph is proportional to the numbers that the graph represents. Unfortunately, this is not always the case." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"The visual representation of a scale - an axis with ticks - looks like a ladder. Scales are the types of functions we use to map varsets to dimensions. At first glance, it would seem that constructing a scale is simply a matter of selecting a range for our numbers and intervals to mark ticks. There is more involved, however. Scales measure the contents of a frame. They determine how we perceive the size, shape, and location of graphics. Choosing a scale" (even a default decimal interval scale) requires us to think about what we are measuring and the meaning of our measurements. Ultimately, that choice determines how we interpret a graphic." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"A diagram is a graphic shorthand. Though it is an ideogram, it is not necessarily an abstraction. It is a representation of something in that it is not the thing itself. In this sense, it cannot help but be embodied. It can never be free of value or meaning, even when it attempts to express relationships of formation and their processes. At the same time, a diagram is neither a structure nor an abstraction of structure." (Peter Eisenman, "Written Into the Void: Selected Writings", 1990-2004, 2007)

"Graphical displays are often constructed to place principal focus on the individual observations in a dataset, and this is particularly helpful in identifying both the typical positions of datapoints and unusual or influential cases. However, in many investigations, principal interest lies in identifying the nature of underlying trends and relationships between variables, and so it is oten helpful to enhance graphical displays in wayswhich give deeper insight into these features.his can be very beneficial both for small datasets, where variation can obscure underlying patterns, and large datasets, where the volume of data is so large that effective representation inevitably involves suitable summaries." (Adrian W Bowman, "Smoothing Techniques for Visualisation" [in "Handbook of Data Visualization"], 2008)

"Heatmaps are two-dimensional graphical representations of data where the values of a variable are shown as colors. Heatmaps are compelling for two reasons. First, the intuitive nature of the color scale as it relates to temperature minimizes the amount of learning necessary to understand it. From experience, we know that yellow is warmer than green, orange is warmer than yellow, and red is hot. It is not difficult to then figure out that the amount of heat is proportional to the level of the represented variable. Second, heatmaps show the data directly over the stimulus. Because the data could not be any closer to the elements to which they pertain, little mental effort is required to read a heatmap." (Agnieszka Bojkon, "Informative or Misleading? Heatmaps Deconstructed", [in "Human-Computer Interaction: New Trends, 13th International Conference"] 2009)

"Data art is characterized by a lack of structured narrative and absence of any visual analysis capability. Instead, the motivation is much more about creating an artifact, an aesthetic representation or perhaps a technical/technique demonstration. At the extreme end, a design may be more guided by the idea of fun or playfulness or maybe the creation of ornamentation." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"What is good visualization? It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns, and outliers that tell you about yourself and what surrounds you. The best visualization evokes that moment of bliss when seeing something for the first time, knowing that what you see has been right in front of you, just slightly hidden. Sometimes it is a simple bar graph, and other times the visualization is complex because the data requires it." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula" (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Information visualization displays meet the definition of an art form in that there is an intended message to be communicated, and the principles of graphic design are applied as they are in other information graphics. Unlike other forms of representational art, InfoVis is a representational art of 'information' as an abstract phenomenon, with the goal of engaging the viewer with forms of interactivity that are not possible with a painting." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Knowing what graphic representation to apply is partially a function of the data themselves and partially from the designer’s understanding of the target audience viewing the graphic. The Internet and publications have many recommended charting types." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"When it comes to presenting categorical data, pie charts allow an impression of the size of each category relative to the whole pie, but are often visually confusing, especially if they attempt to show too many categories in the same chart, or use a three-dimensional representation that distorts areas. [...] Multiple pie charts are generally not a good idea, as comparisons are hampered by the difficulty in assessing the relative sizes of areas of different shapes. Comparisons are better based on height or length alone in a bar chart." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Heatmap is another representational way in which the frequencies of the various parameters of the data set is represented in different colors, much like an image captured by a thermal imaging camera in which the graph consists of varying temperatures and the temperatures are differentiated according to the colors." (Shreyans Pathak & Shashwat Pathak, "Data Visualization Techniques, Model and Taxonomy", 2020)

"Maps are a type of chart that can convey relationships about space and relationships between objects that we relate to in the real world. Their effectiveness as a communication medium is strongly influenced by a host of factors: the nature of spatial data, the form and structure of representation, their intended purpose, the experience of the audience, and the context in the time and space in which the map is viewed. In other words, maps are a ubiquitous representation of spatial information that we can understand and relate to." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"When dealing with meaningful visual representation, aspects of a representation's meaning can be altered by modifying its visual characteristics; these characteristics are extensively explored in semiotics, the study of signs and symbols and their use or interpretation." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

06 June 2026

📉Graphical Representation: Learning (Just the Quotes)

"The advantages proposed by [the graphical] mode of representation, are to facilitate the attainment of information, and aid the memory in retaining it: which two points form the principal business in what we call learning. Of all the senses, the eye gives the liveliest and most accurate idea of whatever is susceptible of being represented to it; and when proportion between different quantities is the object, then the eye has an incalculable superiority." (William Playfair, The Statistical Breviary", 1801)

"Learning to make graphs involves two things: (1) the techniques of plotting statistics, which might be called the artist's job; and (2) understanding the statistics. When you know how to work out graphs, all kinds of statistics will probably become more interesting to you." (Dyno Lowenstein, "Graphs", 1976)

"For many people the first word that comes to mind when they think about statistical charts is 'lie'. No doubt some graphics do distort the underlying data, making it hard for the viewer to learn the truth. But data graphics are no different from words in this regard, for any means of communication can be used to deceive. There is no reason to believe that graphics are especially vulnerable to exploitation by liars; in fact, most of us have pretty good graphical lie detectors that help us see right through frauds." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Visual thinking can begin with the three basic shapes we all learned to draw before kindergarten: the triangle, the circle, and the square. The triangle encourages you to rank parts of a problem by priority. When drawn into a triangle, these parts are less likely to get out of order and take on more importance than they should. While the triangle ranks, the circle encloses and can be used to include and/or exclude. Some problems have to be enclosed to be managed. Finally, the square serves as a versatile problem-solving tool. By assigning it attributes along its sides or corners, we can suddenly give a vague issue a specific place to live and to move about." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"Humans may crave absolute certainty; they may aspire to it; they may pretend, as partisans of certain religions do, to have attained it. But the history of science - by far the most successful claim to knowledge accessible to humans - teaches that the most we can hope for is successive improvement in our understanding, learning from our mistakes, an asymptotic approach to the Universe, but with the proviso that absolute certainty will always elude us. We will always be mired in error. The most each generation can hope for is to reduce the error bars a little, and to add to the body of data to which error bars apply." (Carl Sagan, "The Demon-Haunted World: Science as a Candle in the Dark", 1995)

"Conflicting with the idea of integrating evidence regardless of its these guidelines provoke several issues: First, labels are data. even intriguing data. [...] Second, when labels abandon the data points, then a code is often needed to relink names to numbers. Such codes, keys, and legends are Impediments to learning, causing the reader's brow to furrow. Third, segregating nouns from data-dots breaks up evidence on the basis of mode (verbal vs. nonverbal), a distinction lacking substantive relevance. Such separation is uncartographic; contradicting the methods of map design often causes trouble for any type of graphical display. Fourth, design strategies that reduce data-resolution take evidence displays in the wrong direction. Fifth, what clutter? Even this supposedly cluttered graph clearly shows the main ideas: brain and body mass are roughly linear in logarithms, and as both variables increase, this linearity becomes less tight." (Edward R Tufte, "Beautiful Evidence", 2006) [argumentation against Cleveland's recommendation of not using words on data plots]

"Heatmaps are two-dimensional graphical representations of data where the values of a variable are shown as colors. Heatmaps are compelling for two reasons. First, the intuitive nature of the color scale as it relates to temperature minimizes the amount of learning necessary to understand it. From experience, we know that yellow is warmer than green, orange is warmer than yellow, and red is hot. It is not difficult to then figure out that the amount of heat is proportional to the level of the represented variable. Second, heatmaps show the data directly over the stimulus. Because the data could not be any closer to the elements to which they pertain, little mental effort is required to read a heatmap." (Agnieszka Bojkon, "Informative or Misleading? Heatmaps Deconstructed", [in "Human-Computer Interaction: New Trends, 13th International Conference"] 2009)

"Infographics combine data with design to enable visual learning. This communication process helps deliver complex information in a way that is more quickly and easily understood. [...] In an era of data overload, infographics offer your audience information in a format that is easy to consume and share. [...] A well-placed, self-contained infographic addresses our need to be confident about the content we’re sharing. Infographics relay the gist of your information quickly, increasing the chance for it to be shared and fueling its spread across a wide variety of digital channels." (Mark Smiciklas, "The Power of Infographics: Using Pictures to Communicate and Connect with Your Audiences", 2012)

"Learning comes from doing. One must write every day, even twice a day, to get the feel of words, the tenor of voice and a sense of flow. Writing theory is fine, but without the hands-on experience, without reading what is written - outloud to oneself - writing as an extension of the writer is impossible to achieve." (Steven Heller, "Writing and Research for Graphic Designers: A Designer's Manual to Strategic Communication and Presentation", 2012) 

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Sometimes bar charts are avoided because they are common. This is a mistake. Rather, bar charts should be leveraged because they are common, as this means less of a learning curve for your audience." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Just because there’s a number on it, it doesn’t mean that the number was arrived at properly. […] There are a host of errors and biases that can enter into the collection process, and these can lead millions of people to draw the wrong conclusions. Although most of us won’t ever participate in the collection process, thinking about it, critically, is easy to learn and within the reach of all of us." (Daniel J Levitin, "Weaponized Lies", 2017)

05 June 2026

📉Graphical Representation: Quality (Just the Quotes)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Charts and graphs represent an extremely useful and flexible medium for explaining, interpreting, and analyzing numerical facts largely by means of points, lines, areas, and other geometric forms and symbols. They make possible the presentation of quantitative data in a simple, clear, and effective manner and facilitate comparison of values, trends, and relationships. Moreover, charts and graphs possess certain qualities and values lacking in textual and tabular forms of presentation." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"Evidence is evidence, whether words, numbers, images, din grams- still or moving. It is all information after all. For readers and viewers, the intellectual task remains constant regardless of the particular mode of evidence: to understand and to reason about the materials at hand, and to appraise their quality, relevance. and integrity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation - outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, willful misunderstanding of other points of view - suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Making an evidence presentation is a moral act as well as an intellectual activity. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"The Sixth Principle for the analysis and display of data: 'Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.' This suggests that the most effective way to improve a presentation is to get better content. It also suggests that design devices and gimmicks cannot salvage failed content." (Edward R Tufte, "Beautiful Evidence", 2006)

"A beautiful visualization has a clear goal, a message, or a particular perspective on the information that it is designed to convey. Access to this information should be as straightforward as possible, without sacrificing any necessary, relevant complexity. [...] Most importantly, beautiful visualizations reflect the qualities of the data that they represent, explicitly revealing properties and relationships inherent and implicit in the source data. As these properties and relationships become available to the reader, they bring new knowledge, insight, and enjoyment." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"While the information is of the utmost importance when it comes to soundness, what is done with the information - essentially, how it is designed - is also important. With this in mind, there are two things to consider: format and design quality. If an inappropriate format is used, the outcome will be inferior. Similarly, if the design misrepresents or skews the information deliberately or due to user error, or if the design is inappropriate given the subject matter, it cannot be considered high quality, no matter how aesthetically appealing it appears at first glance." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Even with a solid narrative and insightful visuals, a data story cannot overcome a weak data foundation. As the master architect, builder, and designer of your data story, you play an instrumental role in ensuring its truthfulness, quality, and effectiveness. Because you are responsible for pouring the data foundation and framing the narrative structure of your data story, you need to be careful during the analysis process. Because all of the data is being processed and interpreted by you before it is shared with others, it can be exposed to cognitive biases and logical fallacies that distort or weaken the data foundation of your story." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

01 June 2026

✏️Christian Tominski - Collected Quotes

"A difficulty with combined bivariate visualizations is that the connection between the individual displays has to be established by the observer mentally. That is, as the eyes move from one bivariate display to the next, the observer has to keep track of the visited dots in order to form a complete understanding of data tuples. Visualization techniques based on polylines aim to tackle this difficulty. The basic strategy is to create m axes, one for each attribute, and n polylines, one for each data tuple. The polyline of an m-variate data tuple is constructed as follows. For each attribute value of the data tuple, a position is computed at the corresponding attribute axis. The m positions that we obtain are then connected to form the polyline that represents the entire tuple." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"A scatter plot consists of two orthogonally aligned axes that represent the value ranges of two data variables. Dots are placed in the space spanned by the axes in order to visualize the data elements. Conceptually, this corresponds to a mapping of data to position. A first data variable is mapped with respect to the horizontal x-axis, and a second variable with respect to the vertical y-axis." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"A stream graph is a technique for visualizing multivariate temporal data with a linear arrangement of time. As in the previous two examples, time is shown along the horizontal display axis from left to right. The multivariate data attributes are visualized as stacked streams, there is one stream for each attribute. The actual visual encoding is based on varying the thickness of the streams along the horizontal axis. That is, the vertical height of a stream at a particular horizontal position represents the underlying data value at the corresponding time. Various alternatives exist for ordering the streams and shaping the overall stack of streams." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"An important property of a data domain is its scale. The scale determines what relations and operations are possible for the data values in the domain. At the top level, we can differentiate qualitative (or categorical) and quantitative (or numerical) data. At a second level, we can further categorize qualitative data into nominal and ordinal data, and quantitative data into discrete and continuous data." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Description is all about characterizing an observation by the associated data elements, and thereby deriving a specification for an observation. For example, an outlier can be described by its characteristic values and, if available, its spatio-temporal context. A proper description may serve as a basis for configuring further analysis steps. In particular, a description allows for sharing first insights with other people, who can later be involved in verifying the analysis results." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Explanation means identifying all contributing data and finding the main causes behind an observation. This involves investigating several questions. Is the observation by itself significant or did we just interpret too much into the noise among the data? Does the observation re-occur throughout the data or are we looking at a singular outlier produced by unli kely circumstances? If the observation does re-occur, does it show up reliably under the same conditions, thus forming a pattern, or are its appearances seemingly random?" (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Node-link, matrix, and implicit representations are suited for different graph data. Node-link diagrams are good for sparse networks, which have a moderate number of edges. Dense networks with many edges are best visualized using a matrix. Trees, as we just said, are nicely represented by implicit approaches." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Often, finding the spatial scale that best matches the task at hand is a trial-and-error procedure. It may even be necessary to create further spatial scales by subsuming or subdividing spatial units. Coarser scales can be derived from the original scale by means of a suitable aggregation strategy. This includes the application of aggregation functions such as average, sum, or count. For the creation of finer scales, a suitable distribution strategy is required to assign data values to the newly specified sub-regions. Usually, additional context information is necessary to arrive at semantically meaningful aggregations and distribution." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Presentation is to communicate confirmed analysis results. While explanation and confirmation were about convincing ourselves, presentation is about convincing others of what we have found in the data. This is best done by telling a story about the data, the analysis, and the results. Such a story can act at different levels of emphasis. We may inform an audience by letting the results speak for themselves, explicate the results to an audience, or even persuade an audience into agreement with the results. The audience in this context can be the listeners of a talk, the readers of an article, or colleagues participating in a scientific discussion." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The simple, yet very effective idea of table-based visualization is to retain the tabular layout of spreadsheets, but to replace the textual representation of data values by a visual representation. A visual representation will not only make the interpretation of the data much easier, it will also require less display space." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The advantage of sequencing views in time is that each view can fully utilize the display space. There is no need to divide the space among views. Obviously, sequencing views in time is particularly suited to convey temporal characteristics of data. It can also be helpful to take the user on a journey from one data facet to another. However, presenting views in quick succession to the user also has some limitations. For example, it could be difficult to make sense of all the information provided during a sequence of views. Especially when sequences take a long time, users may be unable to follow and could drown in an indigestible flood of visual representations. Therefore, it is mandatory to provide interactive controls to pause, slow down, reverse, and advance the presentation." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The cycle plot is a technique particularly designed for the combined visualization of linear and cyclic components of temporal data. The basic idea is to show the cyclic component as a line plot into which several smaller plots are embedded to visualize the linear component. As such, the cycle plot is a kind of nested visualization." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The triangular model is a technique particularly for visualizing intervals. It is based on two coordinate axes, the horizontal one representing time and the vertical one representing duration. In the triangular model, an interval is represented as a dot with two attached arms. The dot is placed so that the arms connect the time axis exactly at the start and the end of the represented interval. The point’s height corresponds to the interval’s duration." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The triangular model is useful when it comes to reasoning about properties and the relationships of multiple intervals, because it generates easily distinguishable visual patterns for all possible interval relations. There is even room for visualizing data that might be associated with the intervals. The dot-based encoding would allow for resizing or coloring the dots based on some attribute values. Yet, the triangular model is only of limited use for multivariate attributes." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"When the data to be analyzed become more complex, it is no longer feasible to indiscriminately present each and every aspect of the data in a single view. When we reach this point, it makes sense to create several dedicated visual representations, each focused on communicating a particular aspect or facet of the data. The question is how several such views can be presented to the user in order to convey a comprehensive picture?" (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"With each variable being added to the visual mapping, the richness of the visual representation is increased. Theoretically, we could add yet another visual variable, for example, by texturing the shapes. However, from a practical point of view, there are limits. While a rich visual mapping opens up the possibility to make a wider range of analytic discoveries, the downside is that the mental effort required to digest the visual representation increases as well. Therefore, it is really important to balance the visual mapping according to the task and the data." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019) 

31 May 2026

📉Graphical Representation: Reality (Just the Quotes)

"Judgment must be used in the showing of figures in any chart or numerical presentation, so that the figures may not give an appearance of greater accuracy than their method of collection would warrant. Too many otherwise excellent reports contain figures which give the impression of great accuracy when in reality the figures may be only the crudest approximations. Except in financial statements, it is a safe rule to use ciphers whenever possible at the right of all numbers of great size. The use of the ciphers greatly simplifies the grasping of the figures by the reader, and, at the same time, it helps to avoid the impression of an accuracy which is not warranted by the methods of collecting the data." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"A fundamental value in the scientific outlook is concern with the best available map of reality. The scientist will always seek a description of events which enables him to predict most by assuming least. He thus already prefers a particular form of behavior. If moralities are systems of preferences, here is at least one point at which science cannot be said to be completely without preferences. Science prefers good maps." (Anatol Rapoport, "Science and the goals of man: a study in semantic orientation", 1950)

"It is really questionable - though bordering on heresy to put the question - whether we would be any the worse off if the whole bag of tricks were scrapped. So many of these index numbers are so ancient and so out of date, so out of touch with reality, so completely devoid of practical value when they have been computed, that their regular calculation must be regarded as a widespread compulsion neurosis. Only lunatics and public servants with no other choice go on doing silly things and liking it." (Michael J Moroney, "Facts from Figures", 1951)

"Data analysis typically begins with straight-line models because they are simplest, not because we believe reality is inherently linear. Theory or data may suggest otherwise [...]" (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"Because 'reality' and 'truth' are essential in these figures, it is important to be straightforward and thoughtful in the selection of the areas to be used. Manipulation such as enlargement, reduction, and increase or decrease of contrast must not distort or change the information. Touch-up is permissible only to eliminate distracting artifacts. Labels should be used judiciously and sparingly, and should not hide or distract from important information." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"New information is constantly flowing in, and your brain is constantly integrating it into this statistical distribution that creates your next perception (so in this sense 'reality' is just the product of your brain’s ever-evolving database of consequence). As such, your perception is subject to a statistical phenomenon known in probability theory as kurtosis. Kurtosis in essence means that things tend to become increasingly steep in their distribution [...] that is, skewed in one direction. This applies to ways of seeing everything from current events to ourselves as we lean 'skewedly' toward one interpretation, positive or negative. Things that are highly kurtotic, or skewed, are hard to shift away from. This is another way of saying that seeing differently isn’t just conceptually difficult - it’s statistically difficult." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"Any chart is a simplification of reality, and it reveals as much as it hides. Therefore, it’s always worth asking ourselves: What other patterns or trends may be hidden behind the data displayed on the chart?" (Alberto Cairo, "How Charts Lie", 2019)

"No chart can ever capture reality in all its richness. However, a chart can be made worse or better depending on its ability to strike a balance between oversimplifying that reality and obscuring it with too much detail." (Alberto Cairo, "How Charts Lie", 2019)

🎯C S V Murthy - Collected Quotes

"[a scatter diagram] is a graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present. It graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will have the line." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Decision tree is a graphical representation of a decision situation in which decision situation points (nodes) are connected together by arcs (one for each alternative on a decision) and terminate in ovals (the action that is the result of all the decisions made on the path leading to that oval). [...] A tree is made up of multilevel group of elements called nodes. A node is nothing more than a point at which subsidiary data originate. This particular logical data structure is called a tree simply because it looks like a tree, usually turned upside down. Genealogists use a schema called a tree to show ancestral descent of a person, family or group. Data associated by a tree schema are hierarchical. They branch from a point or node without forming loops or polygons. Data presented in a tree structure make two conditions. First, the tree must have a single root node. Second, all nodes other than the root node must be related to one and only one higher level node." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Every interaction includes both presentation and dialogue. Presentation provides the layout of information on a computer screen. Dialogue provides an interaction sequence between a user and computer. Interfaces and dialogue will help users to solve their problems. Presentation must include objects that the user can readily understand in terms of their daily work. The dialogue must correspond to user’s normal work and to their mental model of the system (Mental model is the way a user sees a problem). Both presentation/dialogue depend on what users are doing." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Information relevance refers to the extent to which information is appropriate for the decision-making situation facing the manager. Extraneous or extra information distracts the decision-maker from the assigned task and information overload frustrates the decision-maker and impairs the decision-making process. Relevant information must pertain to the problems, decisions and responsibilities of the recipient." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Information that is complete means information that covers key issues and is sufficient to support the decision-making situation at hand without critical omissions. The more complete a body of information, is obviously, the more expensive it is to develop and maintain. Care must also be taken not to provide extra information than needed, due to its expense, and not to provide so much information that the recipient will suffer from information overload (information indigestion)." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Ridge Regression is a technique for analysing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large. So, they may be far from the true value. By adding a degree of bias to the regression estimates, principal components regression reduces the standard errors." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solving certain differential equations, potentially involving the use of the fast Fourier transform. This is an algorithm that samples a signal over a period of time and divides it into its frequency components. These components are single sinusoidal oscillations at distant frequencies each with their own amplitude and phase." (C S V Murthy, "Data and Businesss Analytics", 2020)

"The concept of programmed decisions is important because the ultimate (and unachievable) goal of information systems is to provide purely programmed decisions. Because this is not possible, we seek to provide the optimum type of information to the human decision-maker, who then makes non-programmable decisions. Decisions lend themselves to programming techniques if they are repetitive and routine, and if a procedurs can be worked out for handling them so that each is neither an ad hoc decision nor one to be treated as a new situation each time it arises." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Timeliness means that information is available when it is needed. Most managers function in a dynamic environment of change, demands updated and current information. Computerised information systems have the ability to gather, sort, analyse, store, retrieve, and transmit large amounts of information in a very short period of time. Completeness of information is the extent to which information is all there." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Understanding complex information systems begins with a clear understanding of information and its general characteristics. Information can be considered as the very blood of an organisation, but it must be properly understood and appropriately distinguished from data. Too many times, the terms ‘data’ and ‘information’ are used interchangeably, but the terms most clearly mean entirely different things. Data should be defined as raw, unsummarised and unanalysed facts. Information is data that has been presented in such a way as to alter the receiver’s understanding. Data are the raw materials from which information is derived. This is a necessary distinction for the manager to make, because loads of data can be generated, without producing even one iota of useful information." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Visualisation is any technique for creating images, diagrams or animations to communicate a message; techniques used to communicate data or information by encoding it as visual objects, e.g., points, lines or bars contained in graphics. One of the most important benefits of visualisation is that it allows us visual access to huge amounts of data in easily digestible visuals. Well designed data graphics are usually the simplest, and at the same time, the most powerful." (C S V Murthy, "Data and Businesss Analytics", 2020) 

30 May 2026

✏️Gerald Benoît - Collected Quotes

"A model links to the viewers’ engagement with the visualization. Can the viewers identify the purpose and create a relationship in their mind between the nascent message of your visualization and their knowledge and work practices? When sketching out the design and considering the data, what is the first intention of the design? How will viewers interpret the goal of the visualization?" (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"A well-designed 'information visualization' is interactive, allowing viewers to converse with the data: gaining knowledge, exposing insights, and engaging with the data in unexpected ways. It is only through these conversations that the otherwise static display of data transforms into meaningful information." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Before progressing to analysis and visualization of the data, examine the data for inconsistencies and missing values. Data that fall outside an expected range, values that are missing or null, or have a different encoding or data type need to be addressed." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Contemporary information specialists should at least be conversant in the pros/cons, benefits and liabilities, tech and data requirements of each software product they might use." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Experience shows that both neophyte designers of visualizations and commercial visualization applications often overlook the role that type plays in legibility, aesthetics, and meaning construction. Yet the most successful visualizations are those where the details of data, design, and aesthetics are in harmony, and the interactivity allows the end user to understand the explanation and to explore." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"For an information visualization specialist, we must weigh the impact of the purely visual aspects of our designs as well applying visual norms that facilitate interpretation. Finally, we integrate data as the foundation of the visualization - all in a way where each coheres—that is, each contributes the same message to the viewer albeit in different languages (textual, data, interactive, and visual). It’s not useful nor possible to study themes of the aesthetic, technical, and applications of visuals independently of the others." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Information visualization displays meet the definition of an art form in that there is an intended message to be communicated, and the principles of graphic design are applied as they are in other information graphics. Unlike other forms of representational art, InfoVis is a representational art of 'information' as an abstract phenomenon, with the goal of engaging the viewer with forms of interactivity that are not possible with a painting." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Knowing what graphic representation to apply is partially a function of the data themselves and partially from the designer’s understanding of the target audience viewing the graphic. The Internet and publications have many recommended charting types." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"The problem-solving approach favored in the big data/data science realm is datacentric. This is likely because of the similarities between traditional data- and text-mining activities that incorporate visualizing results for exploration and explanation. This field contributes to receptiveness by institutions and the public to very large datasets and the computational infrastructure that provides the data. For data scientists, however, the ultimate interest is using visuals to help chart the data, as opposed to interacting with them. The emphasis is on large datasets and machine learning." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"The rule of thirds applies to fonts, too. The use of fonts is more subtle than one might imagine at first glance. The extreme subtlety of detail when designing fonts contributes to an equally subtle affective impact on a design. The choice of fonts also contributes more evidently to legibility. To a graphic designer, the choice of font contributes to the overall design, addressing more than legibility because the design is tempered with sensitivity to the limitations of the output device (monitor), size of the font, and the overall aesthetic tone." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

" [...] the rule of three applies to the choice of typography, too. In design practice, there is usually a heading font, body text, and then a font for details. [...]  Even though two of the roles (title and body) are the same font name, one is bold and the other is regular. This equates to two fonts. It is common, too, to use a serif font for a title and then a sans serif for the other two (or vice versa). Learning which fonts to use comes only from practice and studying examples." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"When teaching design composition for posters and for websites, there are some introductory rules [...]. One is the 'rule of thirds'. This equates to (no more than) three colors in the design, three typefaces, and three display areas in a design composition [...]" (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

📉Graphical Representation: Projections (Just the Quotes)

"Whatever relates to extent and quantity may be represented by geometrical figures. Statistical projections which speak to the senses without fatiguing the mind, possess the advantage of fixing the attention on a great number of important facts." (Alexander von Humboldt, 1811)

"Business executives, to be efficient, must constantly plan ahead, but there are pitfalls in attempting to estimate the future growth of a business from a chart of its past history. In the first place, there are too many uncontrollable factors entering into the situation to make the most careful estimate of future growth anything more than a shrewd guess, dependent upon all internal and external conditions remaining the same. To project the growth curve of a business into the future provides a good mark to shoot at, but a bank loan is seldom obtainable on the strength of such a projection."  (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"Charts not only tell what was, they tell what is; and a trend from was to is (projected linearly into the will be) contains better percentages than clumsy guessing." (Robert A Levy, "The Relative Strength Concept of Common Stock Forecasting", 1968)

"There is no end to the information we can use. A 'good' map provides the information we need for a particular purpose - or the information the mapmaker wants us to have. To guide us, a map’s designers must consider more than content and projection; any single map involves hundreds of decisions about presentation." (Peter Turchi, "Maps of the Imagination: The writer as cartographer", 2004)

"The first thing you must understand is that information design is not limited to the visualization of data, in presentation design or any other application. It can and should be used to visualize other concepts such as hierarchy (org charts), anatomy (portfolio allocation), and chronology (timeline of events). Beyond the bar graphs showing sales figures and monthly projections, there are many more opportunities to explain concepts with visuals that will engage your audience and clarify your key points."  (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Conceptually, mosaic plots for s + 1 factors in strength s designs can be used for any s; in practice, the idea is limited by space constraints, especially for accommodating labels for the factor levels. All four margins are used for four-factor projections; with the next dimension, one margin has to be used for two factors. In practice, one will rarely consider mosaic plots for more factors than four at a time." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

"All maps lie because they are based on the principle of projecting a spherical surface, the Earth, onto a plane. All maps distort some geographic feature, such as the sizes of the areas represented or the shapes of those areas."  (Alberto Cairo, "How Charts Lie", 2019)

✏️ Leandro N de Castro - Collected Quotes

"A bar chart is similar to a line chart, except that each data point is replaced by a rectangle with a height proportional to the value. The rectangle is usually centered on the spatial attribute of the data, and its width is often uniform. When values are categorical or discrete and cannot be shown in a series, a bar chart may be a suitable alternative for the line chart. Similarly to the case of a line chart, it is possible to create multivariate bar charts by stack‑ing the bars on top of each other in a form of superimposition easy to interpret." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"A scatterplot is a data visualization graph that uses dots to represent the relationship between two quantitative variables. One variable, called the explanatory variable, is plotted on the x‑axis, and the other variable, called the response variable, is plotted on the y‑axis. It is also possible to include a third categorical variable, represented by different dot colors. Each dot represents an individual data point, and the colors, when used, represent the categories of the dots. Therefore, the data point is organized into two or three columns, one for each variable, and each data point is plotted on the graph using two coordinates, one for each variable, with various colors representing each category.,." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Closure is a feature related to our capability of completing (closing) an object or a shape that is incomplete, that is, one that has some parts missing. The preattentive processing of closure is also automatic, not requiring conscious effort. For example, when looking at any shape, e.g., a circle or a square, with a small part missing, our brain automatically and preattentively perceives whether the shape is incomplete and fills these gaps. Preattentive processing of closure can be used in visual communication to create recognizable symbols and logos." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Color is a powerful visual tool to encode data and convey different meanings, such as  categories, magnitude, visual hierarchy, and even emotions. Using different hues, saturations, and brightness levels can help differentiate between categories or show patterns in the data." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Curvature is another preattentive feature that leads to a fast detection of changes in the degree of curvature, bending, or angularity of a shape or line, such as the presence of a more or less curved line in a group of otherwise similar lines. The degree of curvature in a line or shape can be used to represent different quantities or values, for instance, a smaller or larger number of peaks in a function." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Data visualization, by contrast, focuses on the visual representation of data in such a way that its values, structure, nature, type, and variability are accurately expressed by means of graphs. It aims to support the exploration and understanding of data, the identi‑fication of patterns, trends, distributions, correlations, and anomalies, the communicationof insights, and aid in decision‑making." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Differences in orientation can help us differentiate between items (e.g., data points, lines, objects, etc.) or extract information about the data. For example, using vertical bars in a bar chart can help differentiate between categories, while using horizontal bars can emphasize the magnitude of the data. Angles and direction can be used to convey information, such as trends, movement, sense of depth, or changes in values." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"In data visualization, texture is the visual quality of an object related to its roughness, pattern, or smoothness. It can be created using a variety of techniques, for example, using different line styles, brushes, patterns, and even special effects. Differences in texture can help distinguish between data points or objects, create visual hierarchies, or convey infor‑mation about the data. For example, using different textures for different categories can help viewers quickly identify and differentiate patterns. Like the other features described here, the texture is usually processed preattentively, without the need for focused attention." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Length is another preattentive visual property that can be used to create visual contrast, differences, importance, and proportions. The perception of differences in length normally occurs automatically and rapidly, without conscious effort or attention. It can be used in visual communication to quickly draw attention to important information or to create a visual hierarchy. For example, in a graph, longer bars may indicate larger values or quanti‑ties; in a map, longer lines may indicate longer distances; in a drawing, longer items may convey a sense of flow, etc." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Line charts are useful for identifying patterns and trends in a one‑dimensional sequence of univariate data, that is, continuous data over time with a single value per data item. They map the sequence data (e.g., time) to one dimension, typically the x‑axis, and the data value to another dimension, typically the y‑axis, forming a line; or to the color of a mark or region along the spatial axis, forming a bar. The data is adjusted in size to be within the limits of the display attribute." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive features, such as color, shape, orientation, and size, are those basic visual properties that are processed automatically, without conscious effort or attention. By understanding preattentive features, data analysts can create effective data visualization designs that make use of them to convey information more efficiently and accurately to the audience." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Size is a preattentive feature that exerts a similar effect in vision as that exerted by the line width, that is, to detect differences quickly and automatically in items (e.g., objects, data points, font sizes, etc.). Differences in size can draw attention to specific data points, indicate hierarchy, emphasize specific items, or convey information about the magnitude of the data. Variation in size can be used to represent different quantities or values, where larger sizes may indicate higher values or importance, while smaller sizes may indicate lower values or importance." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive processing of 3D (three‑dimensional) properties allows us to detect the depth and spatial relationships between objects, such as the presence of an object that appears to be closer or farther away than the others, without the need for focused attention. Perspective, lighting, size, or shading can be used to create the illusion of depth and convey information, such as relationships between variables." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The histogram is a useful visualization technique to explore the pattern of a single variable distribution, where the x‑axis represents the range of values, and the y‑axis represents the absoluteor relative frequency of data points within each bin. Histograms allow the exploration of cen‑tral tendency measures, such as the mean and median; dispersion measures, such as the stan‑dard deviation; and range, and shape, such as skewness and kurtosis. It also helps to identify outliers or unusual values and to reveal potential biases or errors in the data collection process." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of density occurs automatically and rapidly, without conscious effort or attention, and can be used in visual communication to create contrast and emphasize importance or relevance. This feature can be swiftly detected by the presence of varying numbers of objects (e.g., data points or shapes) in a given region of the space, rep‑resenting different quantities or values. For instance, in a chart or graph, a higher density of data points can be used to represent a larger quantity, a more significant trend, or a more exciting or energetic area. By making use of the preattentive processing of density, design‑ers can create effective visual designs that convey information quickly and efficiently to the viewer." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of markings (e.g., stripes, dots, crosses, stars, hatchings, etc.) includes various visual properties, such as texture, shading, and patterns. These properties allow us to swiftly detect differences and similarities between objects or regions, such as the presence of a repeating pattern in a group of otherwise random shapes. The presence or absence of certain markings, such as dots or squares, can be used to represent different categories or values." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of closure states that incomplete objects are perceived as complete because our brain tends to fill the gaps to create the complete image. Note that closure is also a pre‑attentive feature and thus plays a key role not only in the quick filling of gaps or completion of shapes, but also in the organization of the information to be conveyed."(Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of common fate proposes that objects that move together or change similarly tend to be perceived as a group or a pattern. In this case, graphs that allow visualizing data obeying this principle will have to embody a type or a sense of motion. To illustrate this principle, let us consider a motion chart, a streamgraph, and a force‑directed graph. The motion chart is a visualization method that shows how data changes over time; the streamgraph is a stacked area graph that shows the changes in a set of data over time; and the force‑directed graph is a network visualization that shows the relationships of nodes in a graph. In all cases, there is a sense of common fate in the data." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of continuity states that objects that are arranged in a smooth, continuous way are more likely to be perceived as a single object, even if their pattern is interrupted. The line chart, the Sankey diagram, and the scatterplot are good examples of the principle of continuity in the use of Gestalt theory in data visualization." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of figure‑ground, also called figure‑field, states that objects are perceived as either being in the foreground or the background. One way of forcing this principle is by using contrasting colors in the background and foreground of an image, for instance, black and white, blue and orange, green and purple, red and green, yellow and purple, pink and green, and others. However, many of these pairs are not suitable for technical and scientific works, and thus, the recommendation is to use colors with parsimony." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of proximity proposes that objects that are close to one another tend to be perceived as a group or a pattern. In data visualization, the heatmap, the scatterplot, and the bar chart are good examples of methods that account for the principle of proximity. The heatmap is a graph in which the values of a matrix are represented by colors, which are a preattentive feature, and neighboring cells in the matrix convey a sense of organization and relationship. The scatterplot places similar data values close to one another, grouping them in the plot. In a bar chart, related data values are placed close together in the bars, allowing a visual association among them." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of similarity proposes that objects that share similar characteristics, such as color or form, tend to be perceived as a group or a pattern. Examples of data visualization techniques that account for the similarity principle in Gestalt theory include a line chart in which lines representing different categories have the same style, a bar chart in which the bar patterns or colors indicate the same group or category, and a scatterplot with different markers representing different categories of categorical variables." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of symmetry states that objects that are symmetrical, or have a balanced appearance, tend to be perceived as a group or a pattern. Some data visualization graphs that can be used to explore this principle are the boxplot with boxes symmetrically placed around the median (Q2), the radar chart displaying multivariate data as a bidimensional chart with quantitative variables, and the mirrored bar chart with two sets of bars with mirrored values displayed." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive processing of position allows us to quickly detect changes in location, such as the presence of a dot or other object that is slightly displaced from the others. The spa‑tial location of visual elements can also be used to guide the viewer’s attention or encode information, such as ranking, hierarchy, or relationship (grouping)." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of shape is a basic visual property that enables us to swiftly 
detect similarities and differences between items based on their shape, without requir‑
ing conscious effort or attention. For instance, in a picture with squares and circles, one 
can quickly differentiate one from the other based on their shapes. Similarly, using differ‑
ent shapes for different forms or categories, or using a shape that is indicative of the data (e.g., a circle for data on a map), can help viewers quickly identify patterns." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

29 May 2026

📉Graphical Representation: Uncertainty (Just the Quotes)

"A histogram consists of the outline of bars of equal width and appropriate length next to each other. By connecting the frequency values at the position of the nominal values" (the midpoints of the intervals) with straight lines, a frequency polygon is obtained. Attaching classes with frequency zero at either end makes the area" (the integral) under the frequency polygon equal  to that under the histogram." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is important to pay heed to the following detail: a disadvantage of logarithmic diagrams is that a graphical integration is not possible, i.e., the area under the curve" (the integral) is of no relevance." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"After you visualize your data, there are certain things to look for […]: increasing, decreasing, outliers, or some mix, and of course, be sure you’re not mixing up noise for patterns. Also note how much of a change there is and how prominent the patterns are. How does the difference compare to the randomness in the data? Observations can stand out because of human or mechanical error, because of the uncertainty of estimated values, or because there was a person or thing that stood out from the rest. You should know which it is." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"In statistics, 'error' is not a synonym for 'mistake', but rather a synonym for 'uncertainty.' Error means that any estimate we make, no matter how precise it looks in our chart or article [...] is usually a middle point of a range of possible values." (Alberto Cairo, "How Charts Lie", 2019)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

28 May 2026

🔭Data Science: Chance (Just the Quotes)

"The universal cause is one thing, a particular cause another. An effect can be haphazard with respect to the plan of the second, but not of the first. For an effect is not taken out of the scope of one particular cause save by another particular cause which prevents it, as when wood dowsed with water, will not catch fire. The first cause, however, cannot have a random effect in its own order, since all particular causes are comprehended in its causality. When an effect does escape from a system of particular causality, we speak of it as fortuitous or a chance happening […]" (Thomas Aquinas, "Summa Theologica", cca. 1266-1273)

"[…] chance, that is, an infinite number of events, with respect to which our ignorance will not permit us to perceive their causes, and the chain that connects them together. Now, this chance has a greater share in our education than is imagined. It is this that places certain objects before us and, in consequence of this, occasions more happy ideas, and sometimes leads us to the greatest discoveries […]" (Claude A Helvetius, "On Mind", 1751)

"But ignorance of the different causes involved in the production of events, as well as their complexity, taken together with the imperfection of analysis, prevents our reaching the same certainty about the vast majority of phenomena. Thus there are things that are uncertain for us, things more or less probable, and we seek to compensate for the impossibility of knowing them by determining their different degrees of likelihood. So it was that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability." (Pierre-Simon Laplace,Recherches, 1º, sur l'Intégration des Équations Différentielles aux Différences Finies, et sur leur Usage dans la Théorie des Hasards", 1773)

"Probability has reference partly to our ignorance, partly to our knowledge [..] The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all cases possible is the measure of this probability, which is thus simply a fraction whose number is the number of favorable cases and whose denominator is the number of all cases possible." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"The facts of greatest outcome are those we think simple; may be they really are so, because they are influenced only by a small number of well-defined circumstances, may be they take on an appearance of simplicity because the various circumstances upon which they depend obey the laws of chance and so come to mutually compensate." (Henri Poincaré, "The Foundations of Science", 1913)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper,The Logic of Scientific Discovery", 1934)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher,The Design of Experiments", 1935)

"The fundamental difference between engineering with and without statistics boils down to the difference between the use of a scientific method based upon the concept of laws of nature that do not allow for chance or uncertainty and a scientific method based upon the concepts of laws of probability as an attribute of nature." (Walter A Shewhart, 1940)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things:" (a) a test of significance, to be used on the data first of all, and" (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney,Facts from Figures", 1951)

"People have erroneous intuitions about the laws of chance. In particular, they regard a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. The prevalence of the belief and its unfortunate consequences for psychological research are illustrated by the responses of professional psychologists to a questionnaire concerning research decisions." (Amos Tversky & Daniel Kahneman,Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow,Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"To understand what kinds of problems are solvable by the Monte Carlo method, it is important to note that the method enables simulation of any process whose development is influenced by random factors. Second, for many mathematical problems involving no chance, the method enables us to artificially construct a probabilistic model" (or several such models), making possible the solution of the problems." (Ilya M Sobol, "A Primer for the Monte Carlo Method", 1994)

"Regression to the mean' […] says that, in any series of events where chance is involved, very good or bad performances, high or low scores, extreme events, etc. tend on the average, to be followed by more average performance or less extreme events. If we do extremely well, we're likely to do worse the next time, while if we do poorly, we're likely to do better the next time. But regression to the mean is not a natural law. Merely a statistical tendency. And it may take a long time before it happens." (Peter Bevelin,Seeking Wisdom: From Darwin to Munger",  2003)

"Each systematic error associated with a given measurement process is always of the same sign and magnitude. It persists measurement after measurement. When its existence is established, such an error is called a bias, and reasonable effort should be made to correct for it. Sometimes the observed bias is the result of the concurrence of several biases that cannot or at least have not been individually identified. One of the purposes of statistical treatment of data is to decide whether an apparently erroneous result is real and indicates a bias or whether it could happen as the result of chance variability, even in a well-behaved measurement system. There can be, of course, biases that have not been identified as such. Also, there are limits to how well one can correct for known biases, and this inadequacy must be considered when limits of uncertainty are assigned to data." (Cheryl Cihon & John K Taylor, "Statistical Techniques for Data Analysis" 2nd. ed., 2005)

"Probability is about making decisions under uncertainty - indeed, where there is no uncertainty, no decision is required, as you would simply choose the outcome that you know will occur. A 'good' or 'rational' decision favours the Cartesian principle that ‘when it is not in our power to follow what is true, we ought to follow what is most probable’. Of course, rational decisions sometimes turn out to be wrong. That does not mean that the decisions were bad - they may have been the best choices, given the information available at the time. […] In the long run, the vagaries of chance tend to even out, but in particular cases it can happen that the long shot comes in first. This is the corollary of a 'good' decision that has bad consequences - a 'bad' or 'irrational' decision that turns out to be right." (Alan Graham, "Developing Thinking in Statistics", 2006) 

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"In bagging, generating complementary base-learners is left to chance and to the unstability of the learning method. In boosting, we actively try to generate complementary base-learners by training the next learner boosting on the mistakes of the previous learners." (Ethem Alpaydin, "Introduction to Machine Learning" 2nd Ed, 2010)

"Be careful not to confuse clustering and stratification. Even though both of these sampling strategies involve dividing the population into subgroups, both the way in which the subgroups are sampled and the optimal strategy for creating the subgroups are different. In stratified sampling, we sample from every stratum, whereas in cluster sampling, we include only selected whole clusters in the sample. Because of this difference, to increase the chance of obtaining a sample that is representative of the population, we want to create homogeneous groups for strata and heterogeneous" (reflecting the variability in the population) groups for clusters." (Roxy Peck et al,Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett,Mindware: Tools for Smart Thinking", 2015)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data" (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"In statistics, the word 'significant' means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis" (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not - that’s a human judgment." (Daniel J Levitin,Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson,The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.