SQL Troubles

01 June 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 194: How Weak‑Point Mapping Reveals the Hidden Cues AI Models Over‑Trust)

Prompt Engineering Series

Prompt: "write a post of 600 words on how weak‑point mapping in AI models allows to identify which types of hidden cues the model over‑trusts"

Introduction

As Artifacts Intelligence (AI) systems grow more capable, one of the most important challenges is understanding why they behave the way they do. Modern models don’t simply follow instructions; they respond to a complex mix of signals - some explicit, some subtle, and some completely unintended. This is where weak‑point mapping becomes a powerful diagnostic tool. It allows researchers to uncover which hidden cues an AI model over‑trusts, revealing blind spots that would otherwise remain invisible.

Weak‑point mapping is the process of systematically probing an AI model with carefully designed prompts to identify the specific patterns, phrases, or contextual signals that disproportionately influence its behavior. These weak points are not necessarily flaws in the traditional sense. Instead, they are over‑weighted cues - signals the model treats as more important than they should be. By mapping these cues, we gain insight into the model’s internal priorities and vulnerabilities.

One of the most striking aspects of weak‑point mapping is how it exposes latent biases in the model’s decision‑making hierarchy. AI systems learn from vast datasets, absorbing statistical patterns that may not align with human expectations. For example, a model might over‑trust authoritative‑sounding language, even when the content is incorrect. Or it might respond more strongly to emotionally charged phrasing, interpreting it as a cue to shift tone or urgency. These tendencies are rarely visible in everyday use, but weak‑point mapping brings them to the surface.

Another important insight comes from observing how models react to structural cues—the formatting, ordering, or framing of information. A model might treat bullet points as more reliable than paragraphs, or prioritize the last instruction in a sequence even when earlier instructions were more important. Weak‑point mapping helps identify these structural preferences by varying the format while keeping the content constant. When the model’s behavior changes dramatically, it signals a hidden dependency.

Weak‑point mapping also reveals how models handle conflicting signals. By presenting prompts that contain both strong and weak cues, researchers can see which ones the model prioritizes. For instance, a model might claim to follow safety rules, but a cleverly phrased request could override those rules if it triggers a cue the model over‑weights—such as a request framed as a system instruction. Identifying these override points is essential for building safer, more reliable AI systems.

One of the most valuable outcomes of weak‑point mapping is its ability to uncover semantic shortcuts - cases where the model relies on superficial correlations rather than deeper reasoning. For example, a model might associate certain keywords with specific actions, even when the surrounding context contradicts that association. By systematically altering the context while keeping the keywords, weak‑point mapping exposes these shortcuts and helps developers correct them.

The technique also highlights how models respond to social cues, such as politeness, urgency, or emotional tone. While these cues can be helpful in making AI interactions feel natural, over‑trusting them can lead to inconsistent or unsafe behavior. Weak‑point mapping helps determine whether the model is overly sensitive to these cues, ensuring that emotional framing does not override more important constraints.

Ultimately, weak‑point mapping is not just a debugging tool - it is a window into the model’s internal logic. By identifying the hidden cues an AI system over‑trusts, researchers can strengthen alignment, improve robustness, and reduce the risk of unintended behavior. In a world where AI systems are increasingly embedded in critical workflows, understanding these weak points is essential. Weak‑point mapping gives us the clarity we need to build models that are not only powerful, but also predictable, trustworthy, and aligned with human intent.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

✏️Christian Tominski - Collected Quotes

"A difficulty with combined bivariate visualizations is that the connection between the individual displays has to be established by the observer mentally. That is, as the eyes move from one bivariate display to the next, the observer has to keep track of the visited dots in order to form a complete understanding of data tuples. Visualization techniques based on polylines aim to tackle this difficulty. The basic strategy is to create m axes, one for each attribute, and n polylines, one for each data tuple. The polyline of an m-variate data tuple is constructed as follows. For each attribute value of the data tuple, a position is computed at the corresponding attribute axis. The m positions that we obtain are then connected to form the polyline that represents the entire tuple." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"A scatter plot consists of two orthogonally aligned axes that represent the value ranges of two data variables. Dots are placed in the space spanned by the axes in order to visualize the data elements. Conceptually, this corresponds to a mapping of data to position. A first data variable is mapped with respect to the horizontal x-axis, and a second variable with respect to the vertical y-axis." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"A stream graph is a technique for visualizing multivariate temporal data with a linear arrangement of time. As in the previous two examples, time is shown along the horizontal display axis from left to right. The multivariate data attributes are visualized as stacked streams, there is one stream for each attribute. The actual visual encoding is based on varying the thickness of the streams along the horizontal axis. That is, the vertical height of a stream at a particular horizontal position represents the underlying data value at the corresponding time. Various alternatives exist for ordering the streams and shaping the overall stack of streams." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"An important property of a data domain is its scale. The scale determines what relations and operations are possible for the data values in the domain. At the top level, we can differentiate qualitative (or categorical) and quantitative (or numerical) data. At a second level, we can further categorize qualitative data into nominal and ordinal data, and quantitative data into discrete and continuous data." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Description is all about characterizing an observation by the associated data elements, and thereby deriving a specification for an observation. For example, an outlier can be described by its characteristic values and, if available, its spatio-temporal context. A proper description may serve as a basis for configuring further analysis steps. In particular, a description allows for sharing first insights with other people, who can later be involved in verifying the analysis results." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Explanation means identifying all contributing data and finding the main causes behind an observation. This involves investigating several questions. Is the observation by itself significant or did we just interpret too much into the noise among the data? Does the observation re-occur throughout the data or are we looking at a singular outlier produced by unli kely circumstances? If the observation does re-occur, does it show up reliably under the same conditions, thus forming a pattern, or are its appearances seemingly random?" (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Node-link, matrix, and implicit representations are suited for different graph data. Node-link diagrams are good for sparse networks, which have a moderate number of edges. Dense networks with many edges are best visualized using a matrix. Trees, as we just said, are nicely represented by implicit approaches." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Often, finding the spatial scale that best matches the task at hand is a trial-and-error procedure. It may even be necessary to create further spatial scales by subsuming or subdividing spatial units. Coarser scales can be derived from the original scale by means of a suitable aggregation strategy. This includes the application of aggregation functions such as average, sum, or count. For the creation of finer scales, a suitable distribution strategy is required to assign data values to the newly specified sub-regions. Usually, additional context information is necessary to arrive at semantically meaningful aggregations and distribution." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"Presentation is to communicate confirmed analysis results. While explanation and confirmation were about convincing ourselves, presentation is about convincing others of what we have found in the data. This is best done by telling a story about the data, the analysis, and the results. Such a story can act at different levels of emphasis. We may inform an audience by letting the results speak for themselves, explicate the results to an audience, or even persuade an audience into agreement with the results. The audience in this context can be the listeners of a talk, the readers of an article, or colleagues participating in a scientific discussion." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The simple, yet very effective idea of table-based visualization is to retain the tabular layout of spreadsheets, but to replace the textual representation of data values by a visual representation. A visual representation will not only make the interpretation of the data much easier, it will also require less display space." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The advantage of sequencing views in time is that each view can fully utilize the display space. There is no need to divide the space among views. Obviously, sequencing views in time is particularly suited to convey temporal characteristics of data. It can also be helpful to take the user on a journey from one data facet to another. However, presenting views in quick succession to the user also has some limitations. For example, it could be difficult to make sense of all the information provided during a sequence of views. Especially when sequences take a long time, users may be unable to follow and could drown in an indigestible flood of visual representations. Therefore, it is mandatory to provide interactive controls to pause, slow down, reverse, and advance the presentation." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The cycle plot is a technique particularly designed for the combined visualization of linear and cyclic components of temporal data. The basic idea is to show the cyclic component as a line plot into which several smaller plots are embedded to visualize the linear component. As such, the cycle plot is a kind of nested visualization." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The triangular model is a technique particularly for visualizing intervals. It is based on two coordinate axes, the horizontal one representing time and the vertical one representing duration. In the triangular model, an interval is represented as a dot with two attached arms. The dot is placed so that the arms connect the time axis exactly at the start and the end of the represented interval. The point’s height corresponds to the interval’s duration." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"The triangular model is useful when it comes to reasoning about properties and the relationships of multiple intervals, because it generates easily distinguishable visual patterns for all possible interval relations. There is even room for visualizing data that might be associated with the intervals. The dot-based encoding would allow for resizing or coloring the dots based on some attribute values. Yet, the triangular model is only of limited use for multivariate attributes." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"When the data to be analyzed become more complex, it is no longer feasible to indiscriminately present each and every aspect of the data in a single view. When we reach this point, it makes sense to create several dedicated visual representations, each focused on communicating a particular aspect or facet of the data. The question is how several such views can be presented to the user in order to convey a comprehensive picture?" (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"With each variable being added to the visual mapping, the richness of the visual representation is increased. Theoretically, we could add yet another visual variable, for example, by texturing the shapes. However, from a practical point of view, there are limits. While a rich visual mapping opens up the possibility to make a wider range of analytic discoveries, the downside is that the mental effort required to digest the visual representation increases as well. Therefore, it is really important to balance the visual mapping according to the task and the data." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

31 May 2026

〽️Prompt Engineering: Copilot Unabridged (Part 193: How Instruction‑Priority Testing Reveals Whether AI Models Obey Visible or Invisible Instructions)

Prompt Engineering Series

Prompt: "write a post of 600 words on how instruction‑priority testing in AI models allows to see whether the model obeys visible or invisible instructions"

Introduction

In the rapidly evolving world of Artificial Intelligence (AI), one of the most important questions researchers and practitioners ask is deceptively simple: Which instructions does the model actually follow? Modern AI systems operate under layers of guidance—some visible to the user, others embedded deep within the model’s training or system‑level configuration. Understanding which instructions take priority is essential for safety, reliability, and transparency. This is where instruction‑priority testing comes into play.

Instruction‑priority testing is the practice of giving an AI model multiple, potentially conflicting instructions and observing which ones it chooses to obey. The goal is not to 'trick' the model but to map the hierarchy of influences acting on it. These influences can include user prompts, system‑level rules, safety constraints, and even subtle patterns learned during training. By intentionally creating controlled conflicts, researchers can see whether the model prioritizes visible instructions - the ones the user explicitly writes - or invisible instructions, such as safety rules, alignment constraints, or internal behavioral patterns.

At its core, instruction‑priority testing works because AI models do not simply execute commands. They interpret them. When a user writes a prompt, the model weighs that prompt against its internal rules and the broader context of the conversation. If the model consistently refuses to follow a user instruction, even when the instruction is clear and harmless, that signals the presence of a stronger, invisible rule. Conversely, if the model follows the user instruction even when it contradicts a system‑level guideline, that suggests the model is over‑prioritizing user input.

One of the most revealing aspects of instruction‑priority testing is how it exposes implicit behavior. For example, a model may be given a visible instruction to respond in a certain style, but an invisible instruction - such as a safety guideline - may override that style if the content touches on sensitive topics. This doesn’t mean the model is malfunctioning. It means the model is following a hierarchy designed to keep interactions safe and responsible. Instruction‑priority testing helps clarify where that hierarchy begins and ends.

Another benefit of this testing method is that it highlights model robustness. A well‑aligned model should consistently prioritize safety‑critical invisible instructions over user‑provided visible ones. If a model can be easily pushed into ignoring its own safeguards, that’s a sign of weak alignment. On the other hand, if a model rigidly follows invisible rules even when the user’s request is harmless and reasonable, that may indicate over‑alignment or inflexibility. Instruction‑priority testing helps strike the right balance.

The technique also sheds light on prompt sensitivity. Some models respond strongly to the phrasing or structure of a prompt, while others maintain stable behavior regardless of wording. By varying the visible instructions - changing tone, order, or specificity—researchers can see how easily the model’s priorities shift. If small changes in phrasing cause large changes in behavior, the model may be too sensitive to surface‑level cues. If the model ignores user phrasing entirely, it may be too anchored to internal rules.

Ultimately, instruction‑priority testing is not about catching AI models doing something wrong. It’s about understanding how they make decisions. In a world where AI systems are becoming more capable and more integrated into daily life, transparency around instruction hierarchy is essential. Users deserve to know when the model is following their guidance and when it is following deeper, invisible rules designed to ensure safety and consistency.

By systematically testing how models respond to conflicting instructions, we gain insight into their internal priorities, their alignment with human values, and their reliability in real‑world scenarios. Instruction‑priority testing is not just a diagnostic tool - it’s a window into the model’s decision‑making process, helping us build AI systems that are both powerful and trustworthy.

Previous Post <<||>> Next Post

📉Graphical Representation: Reality (Just the Quotes)

"Judgment must be used in the showing of figures in any chart or numerical presentation, so that the figures may not give an appearance of greater accuracy than their method of collection would warrant. Too many otherwise excellent reports contain figures which give the impression of great accuracy when in reality the figures may be only the crudest approximations. Except in financial statements, it is a safe rule to use ciphers whenever possible at the right of all numbers of great size. The use of the ciphers greatly simplifies the grasping of the figures by the reader, and, at the same time, it helps to avoid the impression of an accuracy which is not warranted by the methods of collecting the data." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"A fundamental value in the scientific outlook is concern with the best available map of reality. The scientist will always seek a description of events which enables him to predict most by assuming least. He thus already prefers a particular form of behavior. If moralities are systems of preferences, here is at least one point at which science cannot be said to be completely without preferences. Science prefers good maps." (Anatol Rapoport, "Science and the goals of man: a study in semantic orientation", 1950)

"It is really questionable - though bordering on heresy to put the question - whether we would be any the worse off if the whole bag of tricks were scrapped. So many of these index numbers are so ancient and so out of date, so out of touch with reality, so completely devoid of practical value when they have been computed, that their regular calculation must be regarded as a widespread compulsion neurosis. Only lunatics and public servants with no other choice go on doing silly things and liking it." (Michael J Moroney, "Facts from Figures", 1951)

"Data analysis typically begins with straight-line models because they are simplest, not because we believe reality is inherently linear. Theory or data may suggest otherwise [...]" (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"Because 'reality' and 'truth' are essential in these figures, it is important to be straightforward and thoughtful in the selection of the areas to be used. Manipulation such as enlargement, reduction, and increase or decrease of contrast must not distort or change the information. Touch-up is permissible only to eliminate distracting artifacts. Labels should be used judiciously and sparingly, and should not hide or distract from important information." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"New information is constantly flowing in, and your brain is constantly integrating it into this statistical distribution that creates your next perception (so in this sense 'reality' is just the product of your brain’s ever-evolving database of consequence). As such, your perception is subject to a statistical phenomenon known in probability theory as kurtosis. Kurtosis in essence means that things tend to become increasingly steep in their distribution [...] that is, skewed in one direction. This applies to ways of seeing everything from current events to ourselves as we lean 'skewedly' toward one interpretation, positive or negative. Things that are highly kurtotic, or skewed, are hard to shift away from. This is another way of saying that seeing differently isn’t just conceptually difficult - it’s statistically difficult." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"Any chart is a simplification of reality, and it reveals as much as it hides. Therefore, it’s always worth asking ourselves: What other patterns or trends may be hidden behind the data displayed on the chart?" (Alberto Cairo, "How Charts Lie", 2019)

"No chart can ever capture reality in all its richness. However, a chart can be made worse or better depending on its ability to strike a balance between oversimplifying that reality and obscuring it with too much detail." (Alberto Cairo, "How Charts Lie", 2019)

🎯C S V Murthy - Collected Quotes

"[a scatter diagram] is a graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present. It graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will have the line." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Decision tree is a graphical representation of a decision situation in which decision situation points (nodes) are connected together by arcs (one for each alternative on a decision) and terminate in ovals (the action that is the result of all the decisions made on the path leading to that oval). [...] A tree is made up of multilevel group of elements called nodes. A node is nothing more than a point at which subsidiary data originate. This particular logical data structure is called a tree simply because it looks like a tree, usually turned upside down. Genealogists use a schema called a tree to show ancestral descent of a person, family or group. Data associated by a tree schema are hierarchical. They branch from a point or node without forming loops or polygons. Data presented in a tree structure make two conditions. First, the tree must have a single root node. Second, all nodes other than the root node must be related to one and only one higher level node." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Every interaction includes both presentation and dialogue. Presentation provides the layout of information on a computer screen. Dialogue provides an interaction sequence between a user and computer. Interfaces and dialogue will help users to solve their problems. Presentation must include objects that the user can readily understand in terms of their daily work. The dialogue must correspond to user’s normal work and to their mental model of the system (Mental model is the way a user sees a problem). Both presentation/dialogue depend on what users are doing." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Information relevance refers to the extent to which information is appropriate for the decision-making situation facing the manager. Extraneous or extra information distracts the decision-maker from the assigned task and information overload frustrates the decision-maker and impairs the decision-making process. Relevant information must pertain to the problems, decisions and responsibilities of the recipient." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Information that is complete means information that covers key issues and is sufficient to support the decision-making situation at hand without critical omissions. The more complete a body of information, is obviously, the more expensive it is to develop and maintain. Care must also be taken not to provide extra information than needed, due to its expense, and not to provide so much information that the recipient will suffer from information overload (information indigestion)." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Ridge Regression is a technique for analysing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large. So, they may be far from the true value. By adding a degree of bias to the regression estimates, principal components regression reduces the standard errors." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solving certain differential equations, potentially involving the use of the fast Fourier transform. This is an algorithm that samples a signal over a period of time and divides it into its frequency components. These components are single sinusoidal oscillations at distant frequencies each with their own amplitude and phase." (C S V Murthy, "Data and Businesss Analytics", 2020)

"The concept of programmed decisions is important because the ultimate (and unachievable) goal of information systems is to provide purely programmed decisions. Because this is not possible, we seek to provide the optimum type of information to the human decision-maker, who then makes non-programmable decisions. Decisions lend themselves to programming techniques if they are repetitive and routine, and if a procedurs can be worked out for handling them so that each is neither an ad hoc decision nor one to be treated as a new situation each time it arises." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Timeliness means that information is available when it is needed. Most managers function in a dynamic environment of change, demands updated and current information. Computerised information systems have the ability to gather, sort, analyse, store, retrieve, and transmit large amounts of information in a very short period of time. Completeness of information is the extent to which information is all there." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Understanding complex information systems begins with a clear understanding of information and its general characteristics. Information can be considered as the very blood of an organisation, but it must be properly understood and appropriately distinguished from data. Too many times, the terms ‘data’ and ‘information’ are used interchangeably, but the terms most clearly mean entirely different things. Data should be defined as raw, unsummarised and unanalysed facts. Information is data that has been presented in such a way as to alter the receiver’s understanding. Data are the raw materials from which information is derived. This is a necessary distinction for the manager to make, because loads of data can be generated, without producing even one iota of useful information." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Visualisation is any technique for creating images, diagrams or animations to communicate a message; techniques used to communicate data or information by encoding it as visual objects, e.g., points, lines or bars contained in graphics. One of the most important benefits of visualisation is that it allows us visual access to huge amounts of data in easily digestible visuals. Well designed data graphics are usually the simplest, and at the same time, the most powerful." (C S V Murthy, "Data and Businesss Analytics", 2020)

30 May 2026

✏️Gerald Benoît - Collected Quotes

"A model links to the viewers’ engagement with the visualization. Can the viewers identify the purpose and create a relationship in their mind between the nascent message of your visualization and their knowledge and work practices? When sketching out the design and considering the data, what is the first intention of the design? How will viewers interpret the goal of the visualization?" (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"A well-designed 'information visualization' is interactive, allowing viewers to converse with the data: gaining knowledge, exposing insights, and engaging with the data in unexpected ways. It is only through these conversations that the otherwise static display of data transforms into meaningful information." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Before progressing to analysis and visualization of the data, examine the data for inconsistencies and missing values. Data that fall outside an expected range, values that are missing or null, or have a different encoding or data type need to be addressed." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Contemporary information specialists should at least be conversant in the pros/cons, benefits and liabilities, tech and data requirements of each software product they might use." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Experience shows that both neophyte designers of visualizations and commercial visualization applications often overlook the role that type plays in legibility, aesthetics, and meaning construction. Yet the most successful visualizations are those where the details of data, design, and aesthetics are in harmony, and the interactivity allows the end user to understand the explanation and to explore." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"For an information visualization specialist, we must weigh the impact of the purely visual aspects of our designs as well applying visual norms that facilitate interpretation. Finally, we integrate data as the foundation of the visualization - all in a way where each coheres—that is, each contributes the same message to the viewer albeit in different languages (textual, data, interactive, and visual). It’s not useful nor possible to study themes of the aesthetic, technical, and applications of visuals independently of the others." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Information visualization displays meet the definition of an art form in that there is an intended message to be communicated, and the principles of graphic design are applied as they are in other information graphics. Unlike other forms of representational art, InfoVis is a representational art of 'information' as an abstract phenomenon, with the goal of engaging the viewer with forms of interactivity that are not possible with a painting." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"Knowing what graphic representation to apply is partially a function of the data themselves and partially from the designer’s understanding of the target audience viewing the graphic. The Internet and publications have many recommended charting types." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"The problem-solving approach favored in the big data/data science realm is datacentric. This is likely because of the similarities between traditional data- and text-mining activities that incorporate visualizing results for exploration and explanation. This field contributes to receptiveness by institutions and the public to very large datasets and the computational infrastructure that provides the data. For data scientists, however, the ultimate interest is using visuals to help chart the data, as opposed to interacting with them. The emphasis is on large datasets and machine learning." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"The rule of thirds applies to fonts, too. The use of fonts is more subtle than one might imagine at first glance. The extreme subtlety of detail when designing fonts contributes to an equally subtle affective impact on a design. The choice of fonts also contributes more evidently to legibility. To a graphic designer, the choice of font contributes to the overall design, addressing more than legibility because the design is tempered with sensitivity to the limitations of the output device (monitor), size of the font, and the overall aesthetic tone." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

" [...] the rule of three applies to the choice of typography, too. In design practice, there is usually a heading font, body text, and then a font for details. [...] Even though two of the roles (title and body) are the same font name, one is bold and the other is regular. This equates to two fonts. It is common, too, to use a serif font for a title and then a sans serif for the other two (or vice versa). Learning which fonts to use comes only from practice and studying examples." (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

"When teaching design composition for posters and for websites, there are some introductory rules [...]. One is the 'rule of thirds'. This equates to (no more than) three colors in the design, three typefaces, and three display areas in a design composition [...]" (Gerald Benoît,"Introduction to Information Visualization: Transforming Data into Meaningful Information", 2019)

📉Graphical Representation: Projections (Just the Quotes)

"Whatever relates to extent and quantity may be represented by geometrical figures. Statistical projections which speak to the senses without fatiguing the mind, possess the advantage of fixing the attention on a great number of important facts." (Alexander von Humboldt, 1811)

"Business executives, to be efficient, must constantly plan ahead, but there are pitfalls in attempting to estimate the future growth of a business from a chart of its past history. In the first place, there are too many uncontrollable factors entering into the situation to make the most careful estimate of future growth anything more than a shrewd guess, dependent upon all internal and external conditions remaining the same. To project the growth curve of a business into the future provides a good mark to shoot at, but a bank loan is seldom obtainable on the strength of such a projection." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"Charts not only tell what was, they tell what is; and a trend from was to is (projected linearly into the will be) contains better percentages than clumsy guessing." (Robert A Levy, "The Relative Strength Concept of Common Stock Forecasting", 1968)

"There is no end to the information we can use. A 'good' map provides the information we need for a particular purpose - or the information the mapmaker wants us to have. To guide us, a map’s designers must consider more than content and projection; any single map involves hundreds of decisions about presentation." (Peter Turchi, "Maps of the Imagination: The writer as cartographer", 2004)

"The first thing you must understand is that information design is not limited to the visualization of data, in presentation design or any other application. It can and should be used to visualize other concepts such as hierarchy (org charts), anatomy (portfolio allocation), and chronology (timeline of events). Beyond the bar graphs showing sales figures and monthly projections, there are many more opportunities to explain concepts with visuals that will engage your audience and clarify your key points." (Jason Lankow et al, "Infographics: The power of visual storytelling", 2012)

"Conceptually, mosaic plots for s + 1 factors in strength s designs can be used for any s; in practice, the idea is limited by space constraints, especially for accommodating labels for the factor levels. All four margins are used for four-factor projections; with the next dimension, one margin has to be used for two factors. In practice, one will rarely consider mosaic plots for more factors than four at a time." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

"All maps lie because they are based on the principle of projecting a spherical surface, the Earth, onto a plane. All maps distort some geographic feature, such as the sizes of the areas represented or the shapes of those areas." (Alberto Cairo, "How Charts Lie", 2019)

✏️ Leandro N de Castro - Collected Quotes

"A bar chart is similar to a line chart, except that each data point is replaced by a rectangle with a height proportional to the value. The rectangle is usually centered on the spatial attribute of the data, and its width is often uniform. When values are categorical or discrete and cannot be shown in a series, a bar chart may be a suitable alternative for the line chart. Similarly to the case of a line chart, it is possible to create multivariate bar charts by stack‑ing the bars on top of each other in a form of superimposition easy to interpret." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"A scatterplot is a data visualization graph that uses dots to represent the relationship between two quantitative variables. One variable, called the explanatory variable, is plotted on the x‑axis, and the other variable, called the response variable, is plotted on the y‑axis. It is also possible to include a third categorical variable, represented by different dot colors. Each dot represents an individual data point, and the colors, when used, represent the categories of the dots. Therefore, the data point is organized into two or three columns, one for each variable, and each data point is plotted on the graph using two coordinates, one for each variable, with various colors representing each category.,." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Closure is a feature related to our capability of completing (closing) an object or a shape that is incomplete, that is, one that has some parts missing. The preattentive processing of closure is also automatic, not requiring conscious effort. For example, when looking at any shape, e.g., a circle or a square, with a small part missing, our brain automatically and preattentively perceives whether the shape is incomplete and fills these gaps. Preattentive processing of closure can be used in visual communication to create recognizable symbols and logos." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Color is a powerful visual tool to encode data and convey different meanings, such as categories, magnitude, visual hierarchy, and even emotions. Using different hues, saturations, and brightness levels can help differentiate between categories or show patterns in the data." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Curvature is another preattentive feature that leads to a fast detection of changes in the degree of curvature, bending, or angularity of a shape or line, such as the presence of a more or less curved line in a group of otherwise similar lines. The degree of curvature in a line or shape can be used to represent different quantities or values, for instance, a smaller or larger number of peaks in a function." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Data visualization, by contrast, focuses on the visual representation of data in such a way that its values, structure, nature, type, and variability are accurately expressed by means of graphs. It aims to support the exploration and understanding of data, the identi‑fication of patterns, trends, distributions, correlations, and anomalies, the communicationof insights, and aid in decision‑making." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Differences in orientation can help us differentiate between items (e.g., data points, lines, objects, etc.) or extract information about the data. For example, using vertical bars in a bar chart can help differentiate between categories, while using horizontal bars can emphasize the magnitude of the data. Angles and direction can be used to convey information, such as trends, movement, sense of depth, or changes in values." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"In data visualization, texture is the visual quality of an object related to its roughness, pattern, or smoothness. It can be created using a variety of techniques, for example, using different line styles, brushes, patterns, and even special effects. Differences in texture can help distinguish between data points or objects, create visual hierarchies, or convey infor‑mation about the data. For example, using different textures for different categories can help viewers quickly identify and differentiate patterns. Like the other features described here, the texture is usually processed preattentively, without the need for focused attention." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Length is another preattentive visual property that can be used to create visual contrast, differences, importance, and proportions. The perception of differences in length normally occurs automatically and rapidly, without conscious effort or attention. It can be used in visual communication to quickly draw attention to important information or to create a visual hierarchy. For example, in a graph, longer bars may indicate larger values or quanti‑ties; in a map, longer lines may indicate longer distances; in a drawing, longer items may convey a sense of flow, etc." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Line charts are useful for identifying patterns and trends in a one‑dimensional sequence of univariate data, that is, continuous data over time with a single value per data item. They map the sequence data (e.g., time) to one dimension, typically the x‑axis, and the data value to another dimension, typically the y‑axis, forming a line; or to the color of a mark or region along the spatial axis, forming a bar. The data is adjusted in size to be within the limits of the display attribute." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive features, such as color, shape, orientation, and size, are those basic visual properties that are processed automatically, without conscious effort or attention. By understanding preattentive features, data analysts can create effective data visualization designs that make use of them to convey information more efficiently and accurately to the audience." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Size is a preattentive feature that exerts a similar effect in vision as that exerted by the line width, that is, to detect differences quickly and automatically in items (e.g., objects, data points, font sizes, etc.). Differences in size can draw attention to specific data points, indicate hierarchy, emphasize specific items, or convey information about the magnitude of the data. Variation in size can be used to represent different quantities or values, where larger sizes may indicate higher values or importance, while smaller sizes may indicate lower values or importance." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive processing of 3D (three‑dimensional) properties allows us to detect the depth and spatial relationships between objects, such as the presence of an object that appears to be closer or farther away than the others, without the need for focused attention. Perspective, lighting, size, or shading can be used to create the illusion of depth and convey information, such as relationships between variables." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The histogram is a useful visualization technique to explore the pattern of a single variable distribution, where the x‑axis represents the range of values, and the y‑axis represents the absoluteor relative frequency of data points within each bin. Histograms allow the exploration of cen‑tral tendency measures, such as the mean and median; dispersion measures, such as the stan‑dard deviation; and range, and shape, such as skewness and kurtosis. It also helps to identify outliers or unusual values and to reveal potential biases or errors in the data collection process." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of density occurs automatically and rapidly, without conscious effort or attention, and can be used in visual communication to create contrast and emphasize importance or relevance. This feature can be swiftly detected by the presence of varying numbers of objects (e.g., data points or shapes) in a given region of the space, rep‑resenting different quantities or values. For instance, in a chart or graph, a higher density of data points can be used to represent a larger quantity, a more significant trend, or a more exciting or energetic area. By making use of the preattentive processing of density, design‑ers can create effective visual designs that convey information quickly and efficiently to the viewer." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of markings (e.g., stripes, dots, crosses, stars, hatchings, etc.) includes various visual properties, such as texture, shading, and patterns. These properties allow us to swiftly detect differences and similarities between objects or regions, such as the presence of a repeating pattern in a group of otherwise random shapes. The presence or absence of certain markings, such as dots or squares, can be used to represent different categories or values." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of closure states that incomplete objects are perceived as complete because our brain tends to fill the gaps to create the complete image. Note that closure is also a pre‑attentive feature and thus plays a key role not only in the quick filling of gaps or completion of shapes, but also in the organization of the information to be conveyed."(Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of common fate proposes that objects that move together or change similarly tend to be perceived as a group or a pattern. In this case, graphs that allow visualizing data obeying this principle will have to embody a type or a sense of motion. To illustrate this principle, let us consider a motion chart, a streamgraph, and a force‑directed graph. The motion chart is a visualization method that shows how data changes over time; the streamgraph is a stacked area graph that shows the changes in a set of data over time; and the force‑directed graph is a network visualization that shows the relationships of nodes in a graph. In all cases, there is a sense of common fate in the data." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of continuity states that objects that are arranged in a smooth, continuous way are more likely to be perceived as a single object, even if their pattern is interrupted. The line chart, the Sankey diagram, and the scatterplot are good examples of the principle of continuity in the use of Gestalt theory in data visualization." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of figure‑ground, also called figure‑field, states that objects are perceived as either being in the foreground or the background. One way of forcing this principle is by using contrasting colors in the background and foreground of an image, for instance, black and white, blue and orange, green and purple, red and green, yellow and purple, pink and green, and others. However, many of these pairs are not suitable for technical and scientific works, and thus, the recommendation is to use colors with parsimony." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of proximity proposes that objects that are close to one another tend to be perceived as a group or a pattern. In data visualization, the heatmap, the scatterplot, and the bar chart are good examples of methods that account for the principle of proximity. The heatmap is a graph in which the values of a matrix are represented by colors, which are a preattentive feature, and neighboring cells in the matrix convey a sense of organization and relationship. The scatterplot places similar data values close to one another, grouping them in the plot. In a bar chart, related data values are placed close together in the bars, allowing a visual association among them." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of similarity proposes that objects that share similar characteristics, such as color or form, tend to be perceived as a group or a pattern. Examples of data visualization techniques that account for the similarity principle in Gestalt theory include a line chart in which lines representing different categories have the same style, a bar chart in which the bar patterns or colors indicate the same group or category, and a scatterplot with different markers representing different categories of categorical variables." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The principle of symmetry states that objects that are symmetrical, or have a balanced appearance, tend to be perceived as a group or a pattern. Some data visualization graphs that can be used to explore this principle are the boxplot with boxes symmetrically placed around the median (Q2), the radar chart displaying multivariate data as a bidimensional chart with quantitative variables, and the mirrored bar chart with two sets of bars with mirrored values displayed." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"Preattentive processing of position allows us to quickly detect changes in location, such as the presence of a dot or other object that is slightly displaced from the others. The spa‑tial location of visual elements can also be used to guide the viewer’s attention or encode information, such as ranking, hierarchy, or relationship (grouping)." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

"The preattentive processing of shape is a basic visual property that enables us to swiftly

detect similarities and differences between items based on their shape, without requir‑

ing conscious effort or attention. For instance, in a picture with squares and circles, one

can quickly differentiate one from the other based on their shapes. Similarly, using differ‑

ent shapes for different forms or categories, or using a shape that is indicative of the data (e.g., a circle for data on a map), can help viewers quickly identify patterns." (Leandro N de Castro, "Exploratory Data Analysis: Descriptive Analysis, Visualization, and Dashboard Design", 2025)

29 May 2026

📉Graphical Representation: Uncertainty (Just the Quotes)

"A histogram consists of the outline of bars of equal width and appropriate length next to each other. By connecting the frequency values at the position of the nominal values" (the midpoints of the intervals) with straight lines, a frequency polygon is obtained. Attaching classes with frequency zero at either end makes the area" (the integral) under the frequency polygon equal to that under the histogram." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is important to pay heed to the following detail: a disadvantage of logarithmic diagrams is that a graphical integration is not possible, i.e., the area under the curve" (the integral) is of no relevance." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"After you visualize your data, there are certain things to look for […]: increasing, decreasing, outliers, or some mix, and of course, be sure you’re not mixing up noise for patterns. Also note how much of a change there is and how prominent the patterns are. How does the difference compare to the randomness in the data? Observations can stand out because of human or mechanical error, because of the uncertainty of estimated values, or because there was a person or thing that stood out from the rest. You should know which it is." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"In statistics, 'error' is not a synonym for 'mistake', but rather a synonym for 'uncertainty.' Error means that any estimate we make, no matter how precise it looks in our chart or article [...] is usually a middle point of a range of possible values." (Alberto Cairo, "How Charts Lie", 2019)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

28 May 2026

🔭Data Science: Chance (Just the Quotes)

"The universal cause is one thing, a particular cause another. An effect can be haphazard with respect to the plan of the second, but not of the first. For an effect is not taken out of the scope of one particular cause save by another particular cause which prevents it, as when wood dowsed with water, will not catch fire. The first cause, however, cannot have a random effect in its own order, since all particular causes are comprehended in its causality. When an effect does escape from a system of particular causality, we speak of it as fortuitous or a chance happening […]" (Thomas Aquinas, "Summa Theologica", cca. 1266-1273)

"[…] chance, that is, an infinite number of events, with respect to which our ignorance will not permit us to perceive their causes, and the chain that connects them together. Now, this chance has a greater share in our education than is imagined. It is this that places certain objects before us and, in consequence of this, occasions more happy ideas, and sometimes leads us to the greatest discoveries […]" (Claude A Helvetius, "On Mind", 1751)

"But ignorance of the different causes involved in the production of events, as well as their complexity, taken together with the imperfection of analysis, prevents our reaching the same certainty about the vast majority of phenomena. Thus there are things that are uncertain for us, things more or less probable, and we seek to compensate for the impossibility of knowing them by determining their different degrees of likelihood. So it was that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability." (Pierre-Simon Laplace,Recherches, 1º, sur l'Intégration des Équations Différentielles aux Différences Finies, et sur leur Usage dans la Théorie des Hasards", 1773)

"Probability has reference partly to our ignorance, partly to our knowledge [..] The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all cases possible is the measure of this probability, which is thus simply a fraction whose number is the number of favorable cases and whose denominator is the number of all cases possible." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"The facts of greatest outcome are those we think simple; may be they really are so, because they are influenced only by a small number of well-defined circumstances, may be they take on an appearance of simplicity because the various circumstances upon which they depend obey the laws of chance and so come to mutually compensate." (Henri Poincaré, "The Foundations of Science", 1913)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper,The Logic of Scientific Discovery", 1934)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher,The Design of Experiments", 1935)

"The fundamental difference between engineering with and without statistics boils down to the difference between the use of a scientific method based upon the concept of laws of nature that do not allow for chance or uncertainty and a scientific method based upon the concepts of laws of probability as an attribute of nature." (Walter A Shewhart, 1940)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things:" (a) a test of significance, to be used on the data first of all, and" (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney,Facts from Figures", 1951)

"People have erroneous intuitions about the laws of chance. In particular, they regard a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. The prevalence of the belief and its unfortunate consequences for psychological research are illustrated by the responses of professional psychologists to a questionnaire concerning research decisions." (Amos Tversky & Daniel Kahneman,Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow,Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"To understand what kinds of problems are solvable by the Monte Carlo method, it is important to note that the method enables simulation of any process whose development is influenced by random factors. Second, for many mathematical problems involving no chance, the method enables us to artificially construct a probabilistic model" (or several such models), making possible the solution of the problems." (Ilya M Sobol, "A Primer for the Monte Carlo Method", 1994)

"Regression to the mean' […] says that, in any series of events where chance is involved, very good or bad performances, high or low scores, extreme events, etc. tend on the average, to be followed by more average performance or less extreme events. If we do extremely well, we're likely to do worse the next time, while if we do poorly, we're likely to do better the next time. But regression to the mean is not a natural law. Merely a statistical tendency. And it may take a long time before it happens." (Peter Bevelin,Seeking Wisdom: From Darwin to Munger", 2003)

"Each systematic error associated with a given measurement process is always of the same sign and magnitude. It persists measurement after measurement. When its existence is established, such an error is called a bias, and reasonable effort should be made to correct for it. Sometimes the observed bias is the result of the concurrence of several biases that cannot or at least have not been individually identified. One of the purposes of statistical treatment of data is to decide whether an apparently erroneous result is real and indicates a bias or whether it could happen as the result of chance variability, even in a well-behaved measurement system. There can be, of course, biases that have not been identified as such. Also, there are limits to how well one can correct for known biases, and this inadequacy must be considered when limits of uncertainty are assigned to data." (Cheryl Cihon & John K Taylor, "Statistical Techniques for Data Analysis" 2nd. ed., 2005)

"Probability is about making decisions under uncertainty - indeed, where there is no uncertainty, no decision is required, as you would simply choose the outcome that you know will occur. A 'good' or 'rational' decision favours the Cartesian principle that ‘when it is not in our power to follow what is true, we ought to follow what is most probable’. Of course, rational decisions sometimes turn out to be wrong. That does not mean that the decisions were bad - they may have been the best choices, given the information available at the time. […] In the long run, the vagaries of chance tend to even out, but in particular cases it can happen that the long shot comes in first. This is the corollary of a 'good' decision that has bad consequences - a 'bad' or 'irrational' decision that turns out to be right." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"In bagging, generating complementary base-learners is left to chance and to the unstability of the learning method. In boosting, we actively try to generate complementary base-learners by training the next learner boosting on the mistakes of the previous learners." (Ethem Alpaydin, "Introduction to Machine Learning" 2nd Ed, 2010)

"Be careful not to confuse clustering and stratification. Even though both of these sampling strategies involve dividing the population into subgroups, both the way in which the subgroups are sampled and the optimal strategy for creating the subgroups are different. In stratified sampling, we sample from every stratum, whereas in cluster sampling, we include only selected whole clusters in the sample. Because of this difference, to increase the chance of obtaining a sample that is representative of the population, we want to create homogeneous groups for strata and heterogeneous" (reflecting the variability in the population) groups for clusters." (Roxy Peck et al,Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett,Mindware: Tools for Smart Thinking", 2015)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data" (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"In statistics, the word 'significant' means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis" (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not - that’s a human judgment." (Daniel J Levitin,Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson,The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

27 May 2026

📉Graphical Representation: Nomographs (Just the Quotes)

"The term nomography serves to designate the general study of the graphic representation of equations in any number of variables on a plane surface. Its practical applications consist in the representation of the numerical relations between the variables by calibrated systems (straight lines or curves) constructed once for all and permitting the determination by a single reading of one or more of the variables when the others are given." (Howard G Funkhouser," Historical Development of the Graphical Representation of Statistical Data", 1937)

"Now the condition that in the intersection chart three straight lines shall meet in a point is identical with the condition that in the corresponding alignment chart three points shall lie on a straight line. This is called the 'principle of duality', and this condition is given in the form of a determinant, known in nomography as the 'basic nomogram determinant', which enables us to plot the three scales of a nomogram, whether they are straight or curved, on squared paper. Whenever it proves possible to transform an equation into the form of a basic nomogram determinant a true nomogram can be drawn, but only too frequently this proves to be impossible and recourse must be had to graphical methods." (Philip Lyle, "The Construction of Nomograms for Use in Statistics: Part I. True and Empirical", Journal of the Royal Statistical Society - Series C (Applied Statistics) Vol. 3 (2), 1954)

"A nomograph of a formula is a graph or diagram composed of lines scaled relatively and placed in such relative positions that the values of the variables are found on a line crossing the scales. The object is to substitute for the labor of computation a simple mechanical operation such as the one previously described. It is easy to read a nomogram with precision because of the few lines. It provides a tabulation of all possible values, enables solutions to be made irrespective of what quantity in the formula is unknown and also enables one to observe instantly the effect of a change, either small or great, in any one of the variables. The principles of such diagrams may be given in a general way and simple nomograms be constructed, but equations with many unknown quantities cannot be solved graphically without higher mathematics." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"Nomograms are graphic devices for representing equations on a plane surface. They are widely used in engineering design and to a lesser extent in the social and physical sciences. Nomograms can be divided into two classes, or distinct graphic formats: (i) Abac: Equation drawn as a graph on Cartesian or logarithmic coordinates. (ii) Alignment chart. Three or more scales arranged so that a straight line joining two known values cuts the third scale to give the required value." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"Since the chief purpose of the nomogram is to make exact data available for operational use, its chief competitor is the table. Operational tables may break Ehrenberg's two-digit rule, since they are not used to detect general trends but to provide exact data for some operational purpose. The choice between nomogram and table involves a complex tradeoff among cost, space, convenience, accuracy, and speed. These tradeoff situations provide one good reason why no one graphic format is suitable for all purposes. Of course, there can be good methods (sarisfying solutions) for particular cases." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"A great virtue of nomograms is that they are usually multivariate, showing relationships among variables in quite complex systems. It is surely helpful to have both an analysis of the underlying equation along with nomogram visualization of the curves generated by the equation. Nomograms show how equations perform. Nomograms remain useful for understanding; their computational use has passed. Computational power is so cheap now, we don’t need look-up tables or nomograms; we can just plug the numbers into the equations and solve." (Edward Tufte, 2002)

"Nomographs are effective ways to graphically calculate various functionally related quantities. Nomographs are really graphical computational devices. They were once used widely in engineering situations when calculating was more laborious than at the present time, and they still can be useful when complex relationships are concerned. In brief, scales are laid out in which the scale intervals and placement of the lines are chosen by well-established procedures. A straight edge can then be used to interconnect independent variables so the corresponding values of dependent variables can be read." (Cheryl Cihon & John K Taylor, "Statistical Techniques for Data Analysis" 2nd. ed., 2005)

"A nomogram not only sheds light on how the effect of one predictor on the probability of response depends on the levels of other factors, but it allows one to quickly estimate the probability of response for individual subjects." (Frank E. Harrell Jr, "Regression Modeling Strategies", 2015)

26 May 2026

📉Graphical Representation: Format (Just the Quotes)

"A graph presents a limited number of figures in a bold and forceful manner. To do this it usually must omit a large number of figures available on the subject. The choice of what graphic format to use is largely a matter of deciding what figures have the greatest significance to the intended reader and what figures he can best afford to skip." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"Any graphic format can be executed well, or poorly, for a particular purpose. This is often a more significant variable than the choice of format." (Macdonald-Ross, 1977)

"The main benefit of tabular presentation is its compactness; a great deal of data can be put on a single page. Also, even with the two-digit restriction a table presents numbers more exactly than bar or pie charts do. Therefore it seems likely that tables will remain the preferred format for professional users. The great weakness of tables is their abstract nature. A table consists entirely of abstract symbols-words and numbers." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"Some believe that the vertical bar should be used when comparing similar items for different time periods and the horizontal bar for comparing different items for the same time period. However, most people find the vertical-bar format easier to prepare and read. and a more effective way to show most types of comparisons." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Charts are used to represent quantitative data in a graphic format. A chart visually illustrates relationships between numbers. When creating a chart, keep in mind that the goal is to represent the data in a simplified and appealing way so as not to muddle the message the chart is meant to convey." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"For a visual to qualify as beautiful, it must be aesthetically pleasing, yes, but it must also be novel, informative, and efficient. [...] For a visual to truly be beautiful, it must go beyond merely being a conduit for information and offer some novelty: a fresh look at the data or a format that gives readers a spark of excitement and results in a new level of understanding. Well-understood formats (e.g., scatterplots) may be accessible and effective, but for the most part they no longer have the ability to surprise or delight us. Most often, designs that delight us do so not because they were designed to be novel, but because they were designed to be effective; their novelty is a byproduct of effectively revealing some new insight about the world." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"The first requirement of a beautiful visualization is that it is novel, fresh, or unique. It is difficult (though not impossible) to achieve the necessary novelty using default formats. In most situations, well-defined formats have well-defined, rational conventions of use: line graphs for continuous data, bar graphs for discrete data, pie graphs for when you are more interested in a pretty picture than conveying knowledge." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"The best visualizations will reveal what is interesting about the specific data set you’re working with. Different data may require different approaches, encodings, or techniques to reveal its interesting aspects. While default visualization formats are a great place to start, and may come with the correct design choices pre-selected, sometimes the data will yield new knowledge when a different visualization approach or format is used." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Infographics combine data with design to enable visual learning. This communication process helps deliver complex information in a way that is more quickly and easily understood. [...] In an era of data overload, infographics offer your audience information in a format that is easy to consume and share. [...] A well-placed, self-contained infographic addresses our need to be confident about the content we’re sharing. Infographics relay the gist of your information quickly, increasing the chance for it to be shared and fueling its spread across a wide variety of digital channels." (Mark Smiciklas, "The Power of Infographics: Using Pictures to Communicate and Connect with Your Audiences", 2012)

"Presenting data in a graphical format makes it much easier to see and understand what is happening with the data. Data visualization applies to all phases of the data science process." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"There is often no one 'best' visualization, because it depends on context, what your audience already knows, how numerate or scientifically trained they are, what formats and conventions are regarded as standard in the particular field you’re working in, the medium you can use, and so on. It’s also partly scientific and partly artistic, so you get to express your own design style in it, which is what makes it so fascinating." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Data is dirty. Let's just get that out there. How is it dirty? In all sorts of ways. Misspelled text values, date format problems, mismatching units, missing values, null values, incompatible geospatial coordinate formats, the list goes on and on." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

25 May 2026

✏️Michael Macdonald-Ross - Collected Quotes

"Any graphic format can be executed well, or poorly, for a particular purpose. This is often a more significant variable than the choice of format." (Macdonald-Ross, 1977)

"Notations and codes are invented because of the limitations of ordinary language: notations to say things that can hardly be expressed in ordinary language, and codes to hide messages that would otherwise be all too clear. Notations are certainly important for the growth and expression of ideas and hence are of interest to us." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"The practitioner who designs a graphic device is acting, as we all do, with imperfect knowledge. A graphic device is an artifact, intended to get across a particular idea to some particular readers. There is no way a science of instruction could lay down minutely detailed prescriptions for all conceivable situations. This is simple realism. However, it is possible to put together the knowledge we already have, to improve it, and to make it more easily available. Reliable knowledge applied intelligently will improve the effectiveness of graphic communication." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"To design a chart or table the designer may need to go back to source documents to check the definition of key terms, the sampling procedures, and so on. This does require some basic familiarity with research methods, and it may be that the training of graphic designers could be improved in this respect." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

"When hundreds of numbers are arrayed in a complex table most people find it difficult to sort out the significant features; indeed, there are many who cannot interpret even the simplest tables. No doubt it would help if such skills were taught in schools, but they are not, and the practicing communicator has to take people as they are. Therefore it is common practice to use charts for a general readership. Bar charts show quantity by length, and Isotype charts show quantity by rows of standard symbols. In effect this reduces the need for abstract cognition and offers the data as a series of visual comparisons (this/that, here/there, now/then)." (Michael Macdonald-Ross, "Graphics in Texts", Review of Research in Education Vol. 5, 1977)

24 May 2026

📉Graphical Representation: Perspectives (Just the Quotes)

"Comparison between circles of different size should be absolutely avoided. It is inexcusable when we have available simple methods of charting so good and so convenient from every point of view as the horizontal bar." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"In line charts with an arithmetic scale, it is essential to set the base line at zero in order that the correct perspective of the general movement may not be lost. Breaking or leaving off part of the scale leads to misinterpretation, because the trend then shows a disproportionate degree of variation in movement." (Mary E Spear, "Charting Statistics", 1952)

"The information on a plot should be relevant to the goals of the analysis. This means that in choosing graphical methods we should match the capabilities of the methods to our needs in the context of each application. [...] Scatter plots, with the views carefully selected as in draftsman's displays, casement displays, and multiwindow plots, are likely to be more informative. We must be careful, however, not to confuse what is relevant with what we expect or want to find. Often wholly unexpected phenomena constitute our most important findings." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation - outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, willful misunderstanding of other points of view - suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Sorting data is one of the most efficient actions to derive different views of data in order to see the variables from many angles. Sorting is usually not applied to the data itself, but to statistical objects of a plot. We might want to sort the bars in a barchart, the variables in a parallel boxplot or the categories in a boxplot y by x." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"A beautiful visualization has a clear goal, a message, or a particular perspective on the information that it is designed to convey. Access to this information should be as straightforward as possible, without sacrificing any necessary, relevant complexity. [...] Most importantly, beautiful visualizations reflect the qualities of the data that they represent, explicitly revealing properties and relationships inherent and implicit in the source data. As these properties and relationships become available to the reader, they bring new knowledge, insight, and enjoyment." (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"A persuasive visualization primarily serves the relationship between the designer and the reader. It is useful when the designer wishes to change the reader’s mind about something. It represents a very specific point of view, and advocates a change of opinion or action on the part of the reader. In this category of visualization, the data represented is specifically chosen for the purpose of supporting the designer’s point of view, and is presented carefully so as to convince the reader of same." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Processes take place over time and result in change. However, we’re often constrained to depict processes in static graphics, perhaps even a single image. Luckily, a good static graphic can be just as successful, perhaps even more so, than an animation. Giving the reader the ability to see each 'frame' of time can of f er a valuable perspective." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Visualization can be appreciated purely from an aesthetic point of view, but it’s most interesting when it’s about data that’s worth looking at. That’s why you start with data, explore it, and then show results rather than start with a visual and try to squeeze a dataset into it. It’s like trying to use a hammer to bang in a bunch of screws. […] Aesthetics isn’t just a shiny veneer that you slap on at the last minute. It represents the thought you put into a visualization, which is tightly coupled with clarity and affects interpretation." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Interactivity is crucial for building vis tools that handle complexity. When datasets are large enough, the limitations of both people and displays preclude just showing everything at once; interaction where user actions cause the view to change is the way forward. Moreover, a single static view can show only one aspect of a dataset. For some combinations of simple datasets and tasks, the user may only need to see a single visual encoding. In contrast, an interactively changing display supports many possible queries. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"When you are exploring your data, look for alternate views of the data; you just may find a more interesting insight." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"First, from an ethos perspective, the success of your data story will be shaped by your own credibility and the trustworthiness of your data. Second, because your data story is based on facts and figures, the logos appeal will be integral to your message. Third, as you weave the data into a convincing narrative, the pathos or emotional appeal makes your message more engaging. Fourth, having a visualized insight at the core of your message adds the telos appeal, as it sharpens the focus and purpose of your communication. Fifth, when you share a relevant data story with the right audience at the right time (kairos), your message can be a powerful catalyst for change." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Well-designed data graphics provide readers with deeper and more nuanced perspectives, while promoting the use of quantitative information in understanding the world and making decisions." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Numbers can always yield multiple interpretations, and they may be approached from varied angles. We journalists don’t vary our approaches more often because many of us are sloppy, innumerate, or simply forced to publish stories at a quick pace. That’s why chart readers must remain vigilant. Even the most honest chart creator makes mistakes." (Alberto Cairo, "How Charts Lie", 2019)