28 June 2026

📉Graphical Representation: Illustrations (Just the Quotes)

"The visible figures by which principles are illustrated should, so far as possible, have no accessories. They should be magnitudes pure and simple, so that the thought of the pupil may not be distracted, and that he may know what features of the thing represented he is to pay attention to." (National Education Association, 1894)

"Most authors would greatly resent it if they were told that their writings contained great exaggerations, yet many of these same authors permit their work to be illustrated with charts which are so arranged as to cause an erroneous interpretation. If authors and editors will inspect their charts as carefully as they revise their written matter, we shall have, in a very short time, a standard of reliability in charts and illustrations just as high as now found in the average printed page." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Nothing is so illuminating as a set of properly proportioned diagrams. [...] In addition to the significance of graphics in analytical work, it is likewise a valuable aid to the memory. A picture is manifestly more readily retained in mind than a description of the same subject, no matter how vividly it may have been expressed. A pictorial or diagrammatic illustration usually produces a firmer and more lasting impression than any composition of words or tabulation of figures, however well they may be arranged or set forth." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Though variety in method of charting is sometimes desirable in large reports where numerous illustrations must follow each other closely, or in wall exhibits where there must be a great number of charts in rapid sequence, it is better in general to use a variety of effects simply to attract attention, and to present the data themselves according to standard well-known methods." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Admittedly a chart is primarily a picture, and for presentation purposes should be treated as such; but in most charts it is desirable to be able to read the approximate magnitudes by reference to the scales. Such reference is almost out of the question without some rulings to guide the eye. Second, the picture itself may be misleading without enough rulings to keep the eye 'honest'. Although sight is the most reliable of our senses for measuring" (and most other) purposes, the unaided eye is easily deceived; and there are numerous optical illusions to prove it. A third reason, not vital, but still of some importance, is that charts without rulings may appear weak and empty and may lack the structural unity desirable in any illustration." (Kenneth W Haemer, "Hold That Line. A Plea for the Preservation of Chart Scale Ruling", The American Statistician Vol. 1" (1) 1947)

"The impression created by a chart depends to a great extent on the shape of the grid and the distribution of time and amount scales. When your individual figures are a part of a series make sure your own will harmonize with the other illustrations in spacing of grid rulings, lettering, intensity of lines, and planned to take the same reduction by following the general style of the presentation." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"To analyse graphic representation precisely, it is helpful to distinguish it from musical, verbal and mathematical notations, all of which are perceived in a linear or temporal sequence. The graphic image also differs from figurative representation essentially polysemic, and from the animated image, governed by the laws of cinematographic time. Within the boundaries of graphics fall the fields of networks, diagrams and maps. The domain of graphic imagery ranges from the depiction of atomic structures to the representation of galaxies and extends into the spheres of topography and cartography." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"While circle charts are not likely to present especially new or creative ideas, they do help the user to visualize relationships. The relationships depicted by circle charts do not tend to be very complex, in contrast to those of some line graphs. Normally, the circle chart is used to portray a common type of relationship" (namely. part-to-total) in an attractive manner and to expedite the message transfer from designer to user." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Remember, the primary function of a graph of any kind is to illustrate the relationship between two variables. [...] To draw any graph we must have established some relationship between the two variables. This relationship can be in the form of a formula" (equation is the more mathematical term), as we have just seen, or simply a set of observations, as is common in all types of statistical work. Sometimes we develop set of observations and then try to find an equation that expresses, in mathematical language, the relationship between the two variables." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"The types of graphics used in operating a business fall into three main categories: diagrams, maps, and charts. Diagrams, such as organization diagrams, flow diagrams, and networks, are usually intended to graphically portray how an activity should be, or is being, accomplished, and who is responsible for that accomplishment. Maps such as route maps, location maps, and density maps, illustrate where an activity is, or should be, taking place, and what exists there. [...] Charts such as line charts, column charts, and surface charts, are normally constructed to show the businessman how much and when. Charts have the ability to graphically display the past, present, and anticipated future of an activity. They can be plotted so as to indicate the current direction that is being followed in relationship to what should be followed. They can indicate problems and potential problems, hopefully in time for constructive corrective action to be taken." (Robert D Carlsen & Donald L Vest, "Encyclopedia of Business Charts", 1977)

"A good graphic must give the impression that its various parts all belong together. They must be arranged in such a way that the illustration looks like a single entity. A good graphic chart should be more than just the sum of its individual lines, shapes, and shades. It should be more than the individual bars in a bar chart, more than the pieces of a pie chart, more than the boxes in a flow chart. Unity requires the establishment of coherent relationships among the component parts of the drawing. These relationships can be depicted in a very direct manner through the use of connecting lines that serve to connect shapes." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"A graphic is an illustration that, like a painting or drawing, depicts certain images on a flat surface. The graphic depends on the use of lines and shapes or symbols to represent numbers and ideas and show comparisons, trends, and relationships. The success of the graphic depends on the extent to which this representation is transmitted in a clear and interesting manner." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"If you want to dramatize comparisons in relation to the whole. use a pie chart. If you want to add coherence to the narrative, the pie chart also helps because it depicts a whole. If your main interest is in stressing the relationship of one factor to another, use bar charts. If you wish to achieve all these effects. you can use either type of chart. and decide on the basis of which one is more aesthetically or pictorially interesting." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Graphical competence demands three quite different skills: the substantive, statistical, and artistic. Yet now most graphical work, particularly at news publications, is under the direction of but a single expertise-the artistic. Allowing artist-illustrators to control the design and content of statistical graphics is almost like allowing typographers to control the content, style, and editing of prose. Substantive and quantitative expertise must also participate in the design of data graphics, at least if statistical integrity and graphical sophistication are to be achieved." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"In the case of graphs, the number of lines which can be included on any one illustration will depend largely on how close the lines are and how often they cross one another. Three or four is likely to be the maximum acceptable number. In some instances, there may be an argument for using several graphs with one line each as opposed to one graph with multiple lines. It has been shown that these two arrangements are equally satisfactory if the user wishes to read off the value of specific points; if, however, he wishes to compare the lines, than the single multi-line graph is superior." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"The practice of framing an illustration with a drawn rectangle is not recommended. This kind of typographic detailing should never be added purely for aesthetic reasons or for decoration. A simple, purely functional drawing will automatically be aesthetically pleasing. Unnecessary lines usually reduce both legibility and attractiveness." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Graphical illustrations should be simple and pleasing to the eye, but the presentation must remain scientific. In other words, we want to avoid those graphical features that are purely decorative while keeping a critical eye open for opportunities to enhance the scientific inference we expect from the reader. A good graphical design should maximize the proportion of the ink used for communicating scientific information in the overall display." (Phillip I Good & James W Hardin, "Common Errors in Statistics" (and How to Avoid Them)", 2003)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Parallel coordinate plots are often overrated concerning their ability to depict multivariate features. Scatterplots are clearly superior in investigating the relationship between two continuous variables and multivariate outliers do not necessarily stick out in a parallel coordinate plot. Nonetheless, parallel coordinate plots can help to find and understand features such as groups/clusters, outliers and multivariate structures in their multivariate context. The key feature is the ability to select and highlight individual cases or groups in the data, and compare them to other groups or the rest of the data." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009) 

"Presentation graphics face the challenge to depict a key message in - usually a single - graphic which needs to fit very many observers at a time, without the chance to give further explanations or context. Exploration graphics, in contrast, are mostly created and used only by a single researcher, who can use as many graphics as necessary to explore particular questions. In most cases none of these graphics alone gives a comprehensive answer to those questions, but must be seen as a whole in the context of the analysis." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"The amount of information rendered in a single financial graph is easily equivalent to thousands of words of text or a page-sized table of raw values. A graph illustrates so many characteristics of data in a much smaller space than any other means. Charts also allow us to tell a story in a quick and easy way that words cannot." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Thinking of graphics as art leads many to put bells and whistles over substance and to confound infographics with mere illustrations." (Alberto Cairo, "The Functional Art", 2011)

"It is important to remember that a visual representation of a scientific concept (or data) is a re-presentation, and not the thing itself - some interpretation or translation is always involved. There are many parallels between creating a graphic and writing an article. First, you must carefully plan what to 'say', and in what order you will 'say it'. Then you must make judgments to determine a hierarchy of information - what must be included and what could be left out? The process of making a visual representation requires you to clarify your thinking and improve your ability to communicate with others. Furthermore, the process of making an effective graphic often leads to new insights into your work; when you make decisions about how to depict your data and underlying concepts, you must often clarify your basic assumptions." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Processes take place over time and result in change. However, we’re often constrained to depict processes in static graphics, perhaps even a single image. Luckily, a good static graphic can be just as successful, perhaps even more so, than an animation. Giving the reader the ability to see each 'frame' of time can of f er a valuable perspective." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"When you decide how to depict your data, you decide on the abstraction. Will you present a graph? A cartoon? An accurate molecular model? And which features will you include in these representations? Your preferred abstraction should include all necessary information, exclude unnecessary information, and make use of your reader’s preexisting knowledge without being confined by it." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Line graphs that show more than one line can be useful for making comparisons, but sometimes it is important to discuss each individual line. By using sparklines evaluators can call attention to and discuss individual cases. Sparklines can be embedded within a sentence to illustrate a trend and help stakeholders better understand the data. Evaluators can use this simple visualization when creating reports." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013)

"Some scientists (e.g., econometricians) like to work with mathematical equations; others" (e.g., hard-core statisticians) prefer a list of assumptions that ostensibly summarizes the structure of the diagram. Regardless of language, the model should depict, however qualitatively, the process that generates the data - in other words, the cause-effect forces that operate in the environment and shape the data generated." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The effective depiction of an icon often depends on how semantically resonant the image is to the information it represents. The use of icons in charts depends on various factors, including task, how representative they are of the underlying data, and their general recognizability." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

27 June 2026

🎯Shadan Malik - Collected Quotes

"Alert-level control is another feature used to establish relevance within the content domain. Alerts help manage exceptions and alert the user of any unusual change or threshold value reached for any KPI. So, the action resulting from alerts needs to be assigned to those users who need to be informed of the exceptions." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Alerts and KPI thresholds are two sides of the same coin. Alerts are actions taken once a KPI threshold is reached. However, alerts are not defined for every threshold boundary. For the most part, they serve as a warning system when a KPI shows poor performance or an undesired trend. Alerts must always be accompanied by attention-capturing actions such as automatic e-mails and/or visual indication such as blinking or animation on the dashboard. The other variable for alerts is the recipient. There may be one or more appropriate recipients for each alert." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Alerts are integral to the dashboard concept in that they transform the dashboard from a graphical information presentation into a live console for managing organizational processes and performance. Effective dashboard deployment must facilitate easy management of alerts. This management process involves three components: (1) rules, (2) actions, and (3) recipients." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Although it does so much more, the central purpose of a dashboard is to warn the user when any relevant metrics are out of acceptable boundaries. In the dashboard terminology, these alerts consisting of rules and actions add critical value to an enterprise dashboard deployment complemented with strong visual indicators of warnings." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Charts also demand internal color choices: the colors of the pies, bars, speedometer thresholds, and so on. The default colors supplied by any standard dashboard software are often well selected with a professional designer’s input. However, a dashboard creator may have the liberty to change these colors at his or her discretion. If a dashboard is being deployed for a large audience, it is a good practice to seek advice from a professional designer in selecting the chart colors, so that they may have a positive visual appeal to the largest possible number of users. As every professional designer knows, there is a lot of science in color choice and its relative placements. Even more important, a spectrum of emotional messages is associated with each color." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"[...] in many instances the choice of charts may not be so obvious, requiring a degree of flexibility and creativity. Some of the contemporary, popular chart types include traffic lights, speedometers or dials, thermometers, donuts, and bubble charts. The choice of charts also depends on area constraints on the dashboard. For example, if the available area is narrow but high, a thermometer representation may work well instead of a speedometer, which requires more of a square-shaped area. Similarly, traffic lights may represent KPIs effectively within a relatively small area - just enough to have three small circles representing the three colored lamps in a traffic light. This model is also effective in conveying the relative performance of the charted KPIs: a red light jumps out at the viewer, drawing immediate attention." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Metrics are measurements of activities to evaluate performance, mostly within a relative framework of time, geography, and aggregation." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Speedometer chart types could be applied to contrast quota versus actual sales numbers for the sections and categories. Clicking on a given area of the chart could then lead to a more detailed report. Also, regional maps could be transposed with threshold-driven color-coded metrics for better visualization of various states within the region and also to show their comparative performance at a glance." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Subject area is a surrogate layer of content grouping that helps in managing the content access to users. A subject area could be defined as a collection of dashboards, reports, charts, or KPIs." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

 "The dashboard framework must also facilitate a retracing of the drill-down path. A user should be easily able to get to the previous chart from the destination chart. This recursive capacity helps create a better self-guided analysis experience. If users are not able to retrieve the previous chart easily during a drill-down path, they may lose track of their thought sequence. An inability to retrace may lead to user frustration and a dysfunctional self-guided analysis." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"The distinguishing feature is that a dashboard is an application with a collection of metrics, benchmarks, goals, results, and alerts presented in a visually effective manner, whereas a portal is a collection of different applications presented together within a personalized framework." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"The term dashboard has acquired a vibrant new meaning in the field of information management as leading organizations worldwide embrace the idea of empowerment through improved real-time information systems. In the current corporate vocabulary, a dashboard is a rich computer interface with charts, reports, visual indicators, and alert mechanisms that are consolidated into a dynamic and relevant information platform." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"To establish a uniform performance benchmark across the organization, it is important that variance of a specific KPI be consistent across all of its possible grains." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Variance establishes the comparison benchmark for each KPI. It has two requirements: (1) the basis for change and (2) change calculation. The most commonly applied references for the basis are relative periodic comparisons: year ago, quarter ago, and month ago. Other types of change basis are forecast, operational plan, quota, and so on. The most commonly applied values for change calculations are Difference, Percentage Change, and Percent Point Change." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

"Visualization is an issue at the heart of good dashboard software. Good visualization can be the difference between information overload and information insight. Commonly used graphs (charts) are one example of visualization. However, present-day technology has raised the bar of visualization beyond commonplace charts and data widgets. The three key characteristics requiring evaluation within the area of visualization are: (1) Visual intelligence ( 2) Geographic mapping (3) Screen resolution." (Shadan Malik, "Enterprise Dashboards: Design and best practices for IT", 2005)

🪙Business Intelligence: Insight (Just the Quotes)

"Knowledge workers and BI experts must continually evaluate the reports, dashboards, alerts, and other mechanisms for disseminating factual information to ensure the design facilitates insight." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"In fact, the analogy to storytelling is limited when applied to communicating with data. Data visualization has fundamental characteristics missing from traditional storytelling. For example, interactive data visualizations let audiences explore information to find insights that resonate with them. Visualizations take shape based to a large extent on the underlying data. And as this data changes, the emphasis and message of the visualization is likely to change." (Zach Gemignani et al, "Data Fluency", 2014)

"[…] the better insights are communicated, the more likely it is that data leads to positive action (in this case, better business decisions)." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Bad data are expensive: my best estimate is that it costs a typical company 20% of revenue. Worse, they dilute trust - who would trust an exciting new insight if it is based on poor data! And worse still, sometimes bad data are simply dangerous; look at the damage brought on by the financial crisis, which had its roots in bad data." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data storytelling provides a bridge between the worlds of logic and emotion. A data story offers a safe passage for your insights to travel around emotional pitfalls and through analytical resistance that typically impede facts." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"First, from an ethos perspective, the success of your data story will be shaped by your own credibility and the trustworthiness of your data. Second, because your data story is based on facts and figures, the logos appeal will be integral to your message. Third, as you weave the data into a convincing narrative, the pathos or emotional appeal makes your message more engaging. Fourth, having a visualized insight at the core of your message adds the telos appeal, as it sharpens the focus and purpose of your communication. Fifth, when you share a relevant data story with the right audience at the right time (kairos), your message can be a powerful catalyst for change." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Ensure you build into your data literacy strategy learning on data quality. If the individuals who are using and working with data do not understand the purpose and need for data quality, we are not sitting in a strong position for great and powerful insight. What good will the insight be, if the data has no quality within the model?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the same vein, data strategy is often a misnomer for a much wider scope of coverage, but the lack of coherence in how we use the language has led to data strategy being perceived to cover data management activities all the way through to exploitation of data in the broadest sense. The occasional use of information strategy, intelligence strategy or even data exploitation strategy may differentiate, but the lack of a common definition on what we mean tends to lead to data strategy being used as a catch-all for the more widespread coverage such a document would typically include. Much of this is due to the generic use of the term ‘data’ to cover everything from its capture, management, governance through to reporting, analytics and insight." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Current decision-making in business suffers from insight gaps. Organizations invest in data and analytics, hoping that will provide them with insights that they can use to make decisions, but in reality, there are many challenges and obstacles that get in the way of that process. One of the biggest challenges is that these organizations tend to focus on technology and hard skills only. They are definitely important, but you will not automatically get insights and better decisions with hard skills alone. Using data to make better data-informed decisions requires not only hard skills but also soft skills as well as mindsets." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Gaining more insight into data, simplifying data access, enabling shopping-for-data, augmenting traditional data governance, generating active metadata, and accelerating development of products and services are enabled by infusing AI into the Data Fabric architecture. An AI-infused Data Fabric is not only leveraging AI but also likewise an architecture to manage and deal with AI artefacts, including AI models, pipelines, etc." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Establishing a comprehensive observability architecture necessitates a systematic approach that spans the entirety of the data pipeline, from initial telemetry collection to actionable insights accessible by diverse stakeholders. The core objective is to unify distributed data sources - metrics, logs, traces, and quality signals - into a coherent framework that enables rapid diagnosis, continuous monitoring, and strategic decision-making." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"The lakehouse combines the best elements of data lakes and data warehouses for OLAP workloads. It merges the scalability and flexibility of data lakes with the management features and performance optimization of data warehouses. [...] A lakehouse eliminates the need for disjointed systems and provides a single, coherent platform for all forms of data analysis. Lakehouses enhance the performance of data queries and simplify data management, making it easier for organizations to derive insights from their data." (Denny Lee et al, "Delta Lake: The Definitive Guide", 2025)

"Viewing the dendrograms in high dimensions provides insight into how the algorithm has joined points to clusters. For example, single linkage often has edges leading to a single focal point, which might not yield a useful clustering but might help to 
identify outliers. If the edges point to multiple focal points, with long edges bridging gaps in the data, the result is more likely yielding a useful clustering." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

📉Graphical Representation: Sequences (Just the Quotes)

"Though variety in method of charting is sometimes desirable in large reports where numerous illustrations must follow each other closely, or in wall exhibits where there must be a great number of charts in rapid sequence, it is better in general to use a variety of effects simply to attract attention, and to present the data themselves according to standard well-known methods." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"The design process involves a series of operations. In map design, it is convenient to break this sequence into three stages. In the first stage, you draw heavily on imagination and creativity. You think of various graphic possibilities, consider alternative ways." (Arthur H Robinson, "Elements of Cartography", 1953)

"To analyse graphic representation precisely, it is helpful to distinguish it from musical, verbal and mathematical notations, all of which are perceived in a linear or temporal sequence. The graphic image also differs from figurative representation essentially polysemic, and from the animated image, governed by the laws of cinematographic time. Within the boundaries of graphics fall the fields of networks, diagrams and maps. The domain of graphic imagery ranges from the depiction of atomic structures to the representation of galaxies and extends into the spheres of topography and cartography." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"It is almost impossible to define 'time-sequence chart' in a clear and unambiguous manner because of the many forms and adaptations open to this type of chart. However. it might be said that, in essence, time-sequence chart portrays a chain of activities through time, indicates the type of activity in each link of the chain, shows clearly the position of the link in the total sequence chain, and indicates the duration of each activity. The time sequence chart may also contain verbal elements explaining when to begin an activity, how long to continue the activity, and a description of the activity. The chart may also indicate when to blend a given activity with another and the point at which a given activity is completed. The basic time-sequence chart may also be accompanied by verbal explanations and by secondary or contributory charts." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"A flow chart is a graphic method to show pictorially how a series of activities, procedures. operations. events. ideas, or other factors are related to each other. It shows the sequence, cycle. or flow of these factors and how they are connected in a series of steps from beginning to end." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"In the binary digital world of computers, all information is reduced to sequences of zeros and ones. But there’s a space between zero and one, between the way the machine counts and thinks and the way we count and think." (Scott Rosenberg, "Dreaming in Code", 2007)

"At its best, a static visualization is like a powerful photograph - a carefully conceived, arranged, and executed vision that manages to portray the sequence or motion of a story without the actual deployment of movement." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"The main advantage of decision tree models is that they are interpretable. It is relatively easy to understand the sequences of tests a decision tree carried out in order to make a prediction. This interpretability is very important in some domains. [...] Decision tree models can be used for datasets that contain both categorical and continuous descriptive features. A real advantage of the decision tree approach is that it has the ability to model the interactions between descriptive features. This arises from the fact that the tests carried out at each node in the tree are performed in the context of the results of the tests on the other descriptive features that were tested at the preceding nodes on the path from the root. Consequently, if there is an interaction effect between two or more descriptive features, a decision tree can model this."  (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"A time series is a sequence of values, usually taken in equally spaced intervals. […] Essentially, anything with a time dimension, measured in regular intervals, can be used for time series analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"The advantage of sequencing views in time is that each view can fully utilize the display space. There is no need to divide the space among views. Obviously, sequencing views in time is particularly suited to convey temporal characteristics of data. It can also be helpful to take the user on a journey from one data facet to another. However, presenting views in quick succession to the user also has some limitations. For example, it could be difficult to make sense of all the information provided during a sequence of views. Especially when sequences take a long time, users may be unable to follow and could drown in an indigestible flood of visual representations. Therefore, it is mandatory to provide interactive controls to pause, slow down, reverse, and advance the presentation." (Christian Tominski & Heidrun Schumann, "Interactive Visual Data Analysis", 2019)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Beyond the design of individual charts, the sequence of data visualizations creates grammar within the exposition. Cohesive visualizations follow common narrative structures to fully express their message. Order matters. " (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Sequencing is relevant to all visualization (not just instructions) because the author can use graphics and conventions to sequence the reading of visualizations. Annotations, in particular, can be used very effectively to teach conventions and to influence sequencing." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

🎯🏭Eberhard Hechler - Collected Quotes

"A data architecture defines data standards in an organization, including how data is accessed and consumed. It furthermore describes the data structures used by the business units. Data integration also depends on the defined data architecture standards since data integration requires interaction between data." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"A Data Fabric has its focus more on the architectural underpinning, technical capabilities, and intelligent analysis to produce active metadata supporting a smarter, AI-infused system to orchestrate various data integration styles, enabling trusted and reusable data in a hybrid cloud landscape to be consumed by humans, applications, or other downstream systems. Data cataloging to generate and leverage active metadata is seen as a vital component of any Data Fabric." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"A Data Fabric needs to serve analytical and transactional data consumption patterns to, for instance, address MLOps, trustworthy AI, MDM, inferencing, IoT, edge, and 5G." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"A Data Mesh views data primarily as organized around domain owners who create business-focused data products, which can be aggregated and consumed across distributed consumers, organizations, and Line of Business (LoBs) in a self-service and shopping-for-data fashion. Transforming data from disparate data sources to be consumed as data-as-a-product is an essential paradigm of any Data Mesh." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"A data product is based on semantically related raw data that is transformed into a meaningful business context and easily discoverable and consumable by business users." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"An enterprise data warehouse is a central repository of integrated and transformed, structured data from disparate sources and used for reporting and data analysis." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Any project execution would be very difficult without implementation and usage of the right product capabilities. The selected products should support the data sources and platforms in your organization and provide AI-augmented functionality to ingest and automatically enrich metadata, allowing business users to easily understand, collaborate, enrich, and access the right data, to quickly establish an environment for highly automated and consistent governance and automatically secure data across the organization."(Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Building a data product is enabled by the data domain owner; however, building a data product itself is primarily driven by the data product owner, which can be a marketing or a customer care organization, an after-sales team, or even an individual business user. The data product owner is collaborating with data engineers, data scientists, and other subject matter experts throughout the entire data product build process." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data Fabric and Data Mesh provide a unified enterprise data architecture and solution for consolidating dispersed data from a hybrid cloud environment through automated data discovery, smart data integration, and intelligent cataloging." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data Fabric architecture utilizes active metadata, knowledge graphs, and semantic enrichment, combining intelligent information integration and transformation technologies to intelligently support data consumers, for example, business users."  (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data Fabric is an integrated layer of data sources and connection processes based on active metadata." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data lineage and provenance are often used interchangeably. Both terms refer to the entire lifecycle of the data, including the five Ws: (a) where the data originates, (b) where the data has been and where is the destination, (c) who made changes to the data, (d) when the data was created or updated, and (e) where the data is stored and used. Knowing answers to these questions is critical to data consumers to trust analytics outcomes derived from data." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data management is the process of developing, implementing, and monitoring systems, procedures, and practices to deliver and enhance the value of data and assets throughout their lifecycle, while data and AI governance is defined as the exercise of authority and control during the management of data and assets." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data Mesh self-service capabilities are business- and domain-centric; they are geared toward building, delivering, and managing data products in a concrete business, domain, or industry context." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Definition of data and AI governance policies, rules, and classifications is critical to break down data silos, allow for a uniform data consumption, and prevent misuse of data. It includes monitoring of compliance and enforcement of data and AI rules and policies on an ongoing basis, as well as ensuring compliance with regulations and laws." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Drift measures the drop in accuracy and drop in data consistency by comparing accuracy during runtime with the accuracy during training and by comparing key characteristics of the dataset used for training with the dataset during runtime." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Exploiting semantic knowledge graphs can support interpretability and explainability of nearly all AI model types (including DL models) by discovering and depicting semantic and non-obvious relationships or depicting an ML model in a simplified and more readable, explainable way." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Gaining more insight into data, simplifying data access, enabling shopping-for-data, augmenting traditional data governance, generating active metadata, and accelerating development of products and services are enabled by infusing AI into the Data Fabric architecture. An AI-infused Data Fabric is not only leveraging AI but also likewise an architecture to manage and deal with AI artefacts, including AI models, pipelines, etc." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"In Exploiting semantic knowledge graphs can support interpretability and explainability of nearly all AI model types (including DL models) by discovering and depicting semantic and non-obvious relationships or depicting an ML model in a simplified and more readable, explainable way., a Data Mesh solution organizes data around business domain owners and transforms relevant data assets (data sources) to data products that can be consumed by distributed business users from various business domains or functions. These data products are created, governed, and used in an autonomous, decentralized, and self-service manner. Self-service capabilities, which we have already referenced as a Data Fabric capability, enable business organizations to entertain a data marketplace with shopping-for-data characteristics." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"It is essential to realize that the Data Fabric architecture enables the Data Mesh solution via its rich knowledge catalog, semantic search and discovery, smart integration capabilities, and semantic knowledge graphs. Trustworthy AI, for instance, is enabled via the Data Fabric as well." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023) 

"[...] it is the Data Fabric architecture that enables the Data Mesh. In other words, the Data Fabric is the architectural underpinning to implement a Data Mesh solution." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Over 80% of models are never operationalized because the efforts involved in deploying them are enormous and the models are deployed and found to produce drift or fairness issues that outweigh the benefits."  (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Semantic enrichment is the process of adding meaning to data, which is represented as additional metadata in the knowledge catalog. The intent of semantic enrichment is to simplify and optimize some of the key Data Fabric and Data Mesh tasks, such as search and discovery of assets, access, and consumption of assets by applications and business users to build corresponding data products." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The AI lifecycle comprises of business problem understanding, collecting data, preparing data, building the model, deploying the model, monitoring the model, and governing the model." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The aim of a Data Mesh solution is to establish a data marketplace where data can be searched for, discovered, and consumed as a product." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The Data Fabric architecture needs to guarantee this single version of the truth within the application and transactional landscape, which – depending on the deployment option of an MDM solution – could also mean to assemble this single version of the truth based on core information that is dispersed and maintained in various data stores." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The Data Fabric architecture can help enterprises address the challenges of data and AI governance effectively, including the orchestration and exchange of metadata across organizational implementations. First, Data Fabric pulls data from disparate data sources and orchestrates metadata exchange across organizational systems, thus providing a holistic view of data and AI at the enterprise level, which lays a solid technology foundation for a consistent and unified enterprise-level data and AI governance. Likewise, a Data Fabric architecture serves as a foundation for a Data Mesh solution, which is supporting organizational or departmental data and AI governance initiatives. Second, the advanced automation and AI technologies employed by a Data Fabric architecture can greatly simplify the implementation of data and AI governance at the enterprise or organizational level, enabling organizational federated Data Mesh initiatives, where orchestration and exchange of metadata across organizations need to be implemented as well." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The goal of semantic enrichment is to simplify and optimize some of the key Data Fabric and Data Mesh tasks, such as search and discovery of assets, access, and consumption of assets by applications and business users." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The terms Data Fabric and Data Mesh are often viewed as different, conflicting, or at the best overlapping data architectures or frameworks, data management concepts, or approaches to discover, explore, govern, and consume data. However, these concepts are related to each other, where each concept emphasizes specific imperatives or objectives."(Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The term data governance is used for the processes and responsibilities that define, manage, and enforce access, privacy, availability, and security of the organization’s data. It typically includes a set of policies, rules, and data classifications and functionality to monitor and enforce compliance. As stated earlier, we use the term AI governance in a broader sense, also including AI artefacts." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The value of a Data Mesh solution is that it assigns the creation of data products to data engineers and subject matter experts upstream who are most familiar with the business domains and corresponding needs." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"While a Data Fabric is an architecture that facilitates the end-to-end integration of various data and AI pipelines across hybrid cloud environments through the use of intelligent and automated systems and applications, a Data Mesh should be seen as a solution, which is geared toward delivering data-as-a-product in an organizational federated approach." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

26 June 2026

📉Graphical Representation: Background (Just the Quotes)

"A warning seems justifiable that the background of a chart should not be made any more prominent than actually necessary. Many charts have such heavy coordinate ruling and such relatively narrow lines for curves or other data that the real facts the chart is intended to portray do not stand out clearly from the background. No more coordinate lines should be used than are absolutely necessary to guide the eye of the reader and to permit an easy reading of the curves." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"When plotting any curve the vertical scale should, if possible, be chosen so that the zero of the scale will appear on the chart. Otherwise, the reader may assume the bottom of the chart to be zero and so be grossly misled. Zero should always be indicated by a broad line much wider than the ordinary co-ordinate lines used for the background of the chart." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Be aware that bar charts provide ample opportunities for chart junk. The space within the bars is enticingly empty and it is tempting to put images or textures in the background. Some designers even swap out the standard bars for graphics." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Part of the problem with using gauges and dials as alerts is that they become part of the background. If 99% of the time the needle sits in the green, the gauge isnʼt worth looking at; then that one per cent of the time when it is in the red, the gauge will go unnoticed." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Further develop the situation or problem by covering relevant background. Incorporate external context or comparison points. Give examples that illustrate the issue. Include data that demonstrates the problem. Articulate what will happen if no action is taken or no change is made. Discuss potential options for addressing the problem. Illustrate the benefits of your recommended solution." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"In the field of design, experts speak of objects having 'affordances'. These are aspects inherent to the design that make it obvious how the product is to be used. For example, a knob affords turning, a button affords pushing, and a cord affords pulling. These characteristics suggest how the object is to be interacted with or operated. When sufficient affordances are present, good design fades into the background and you don’t even notice it." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"One thing to keep in mind with a table is that you want the design to fade into the background, letting the data take center stage. Don’t let heavy borders or shading compete for attention. Instead, think of using light borders or simply white space to set apart elements of the table." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)


🗃️Data Management: Timeliness (Just the Quotes)

"We analyze numbers in order to know when a change has occurred in our processes or systems. We want to know about such changes in a timely manner so that we can respond appropriately. While this sounds rather straightforward, there is a complication - the numbers can change even when our process does not. So, in our analysis of numbers, we need to have a way to distinguish those changes in the numbers that represent changes in our process from those that are essentially noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage 'what gets measured gets done' still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Access to more information isn’t enough - the information needs to be correct, timely, and presented in a manner that enables the reader to learn from it. The current network is full of inaccurate, misleading, and biased information that often crowds out the valid information. People have not learned that 'popular' or 'available' information is not necessarily valid." (Gene Spafford, 2010) 

"The first myth is that prediction is always based on time-series extrapolation into the future (also known as forecasting). This is not the case: predictive analytics can be applied to generate any type of unknown data, including past and present. In addition, prediction can be applied to non-temporal (time-based) use cases such as disease progression modeling, human relationship modeling, and sentiment analysis for medication adherence, etc. The second myth is that predictive analytics is a guarantor of what will happen in the future. This also is not the case: predictive analytics, due to the nature of the insights they create, are probabilistic and not deterministic. As a result, predictive analytics will not be able to ensure certainty of outcomes." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Timeliness means that information is available when it is needed. Most managers function in a dynamic environment of change, demands updated and current information. Computerised information systems have the ability to gather, sort, analyse, store, retrieve, and transmit large amounts of information in a very short period of time. Completeness of information is the extent to which information is all there." (C S V Murthy, "Data and Businesss Analytics", 2020)

"Data marts are subject-oriented databases typically aligned with a particular business unit like sales, finance, or marketing. These are sometimes called 'functional data marts' since they support specific business functions. Data marts accelerate business processes by allowing access to relevant information in a more timely nature since they are not aggregating the volume and variety (many data sources) that an EDW does. However, they are more transformed or normalized than an ODS." (Scott Burk et al, It’s All Analytics - Part II: Designing an Integrated AI, Analytics, and Data Science Architecture for Your Organization, 2022)

"Data are most valuable at their point of origin. The value of data is directly related to their timeliness." (Lawrence M Miller)


🤖Prompt Engineering: Models (Just the Quotes)

"An internal model allows a system to look ahead to the future consequences of current actions, without actually committing itself to those actions. In particular, the system can avoid acts that would set it irretrievably down some road to future disaster ('stepping off a cliff'). Less dramatically, but equally important, the model enables the agent to make current 'stage-setting' moves that set up later moves that are obviously advantageous. The very essence of a competitive advantage, whether it be in chess or economics, is the discovery and execution of stage-setting moves." (John H Holland, 1992)

"[...] building an effective LLM-based application can require more than just plugging in a pre-trained model and retrieving results - what if we want to parse them for a better user experience? We might also want to lean on the learnings of massively large language models to help complete the loop and create a useful end-to-end LLM-based application. This is where prompt engineering comes into the picture." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Agentic workflows break when the logic is messy - if, say, the plans don’t decompose or memory is poorly structured. However, infrastructure-level LLM applications introduce even more failure points and complexity. If the protocols don’t sync with each other, or the data flows start leaking, or the model boundaries are unclear... there are far too many failure points to count. While most people have been jumping on the bandwagon to adopt MCPs or A2A, very few are equipped to handle the LLMOps issues these tools introduce." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"As the tech industry moves from non-generative models to generative models, it is shifting away from feature engineering, or creating features to model the data and experimenting with different hyperparameters to optimize performance. Generative models, and specifically LLMs, do not require feature engineering. Today, the core requirements are usually prompt engineering or building a RAG pipeline - skills that lie within the domain of AI engineers." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"In prompt engineering, we customize the prompts or questions we give the model to get more accurate or insightful responses. The way a prompt is structured has a massive impact on how well a model understands the task at hand and, ultimately, how well it performs. Given LLMs’ versatility, prompt engineering has become an important skill for getting the most out of these models across different domains and tasks. The key is to understand how different prompt structures lead to different model behaviors. There are various strategies - ranging from simple one-shot prompting to more complex techniques like chain-of-thought prompting - that can significantly improve the effectiveness of LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"[...] prompt engineering, the science and art of crafting the text inputs that are sent to the models. Prompt updates can significantly improve or degrade the user experience. But prompt engineering is iterative and can be difficult to master and document, especially with closed-source LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"Prompt engineering is a crucial aspect of working with large language models (LLMs) like OpenAI's GPT, Google's PaLM, and others in the space of AI and machine learning. It involves the art and science of designing inputs (prompts) in a way that maximizes the quality, relevance, and accuracy of the AI-generated output. As the capabilities of AI continue to improve, the task of crafting effective prompts has become an essential skill for anyone leveraging these tools for real-world applications, including natural language understanding, translation, summarization, code generation, and more." (Code Planet, "Python for Large Language Models", 2025)

"Prompt injection is a security vulnerability that is specific to AI systems, especially LLM systems, in which malicious users try to manipulate prompts to make a model behave in a certain unintended way. They may try to get it to leak data, execute unauthorized tasks (especially with agentic systems), or ignore constraints. This is possible because LLMs are typically encapsulated inside applications using metaprompts, which are developer-created instructions that define the model’s behavior. Metaprompts usually contain safeguard instructions, such as 'do not use curse words', and placeholders where the input submitted by the user is pasted. The user’s input is combined with the metaprompts into a larger prompt that then goes to the model." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"There are three techniques for model domain adaptation: prompt engineering, RAG, and fine-tuning. Strictly speaking, RAG is a form of dynamic prompt engineering where developers use a retrieval system to add content to an existing prompt, but RAG systems are used so often that it’s worth discussing them separately. One critical difference with fine-tuning is that you must have access to the model’s weights, information that is usually not available with cloud-based, proprietary LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"With MCP, a model no longer has to guess what’s possible. Instead, it can discover tools, query data sources, and select prompts - all in real time, all through a shared protocol. This means a model doesn’t just generate responses; it acts, it calls tools, it gathers context, and it learns how to interact with the outside world in a modular,controlled way." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

25 June 2026

🤖Prompt Engineering: Prompt Engineering (Just the Quotes)

"[...] building an effective LLM-based application can require more than just plugging in a pre-trained model and retrieving results - what if we want to parse them for a better user experience? We might also want to lean on the learnings of massively large language models to help complete the loop and create a useful end-to-end LLM-based application. This is where prompt engineering comes into the picture." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Prompt engineering involves crafting inputs to LLMs (prompts) that effectively communicate the task at hand to the LLM, leading it to return accurate and useful outputs. Prompt engineering is a skill that requires an understanding of the nuances of language, the specific domain being worked on, and the capabilities and limitations of the LLM being used." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"As the tech industry moves from non-generative models to generative models, it is shifting away from feature engineering, or creating features to model the data and experimenting with different hyperparameters to optimize performance. Generative models, and specifically LLMs, do not require feature engineering. Today, the core requirements are usually prompt engineering or building a RAG pipeline - skills that lie within the domain of AI engineers." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"In prompt engineering, we customize the prompts or questions we give the model to get more accurate or insightful responses. The way a prompt is structured has a massive impact on how well a model understands the task at hand and, ultimately, how well it performs. Given LLMs’ versatility, prompt engineering has become an important skill for getting the most out of these models across different domains and tasks. The key is to understand how different prompt structures lead to different model behaviors. There are various strategies - ranging from simple one-shot prompting to more complex techniques like chain-of-thought prompting - that can significantly improve the effectiveness of LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"[...] prompt engineering, the science and art of crafting the text inputs that are sent to the models. Prompt updates can significantly improve or degrade the user experience. But prompt engineering is iterative and can be difficult to master and document, especially with closed-source LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"Prompt engineering is a crucial aspect of working with large language models (LLMs) like OpenAI's GPT, Google's PaLM, and others in the space of AI and machine learning. It involves the art and science of designing inputs (prompts) in a way that maximizes the quality, relevance, and accuracy of the AI-generated output. As the capabilities of AI continue to improve, the task of crafting effective prompts has become an essential skill for anyone leveraging these tools for real-world applications, including natural language understanding, translation, summarization, code generation, and more." (Code Planet, "Python for Large Language Models", 2025)

"There are three techniques for model domain adaptation: prompt engineering, RAG, and fine-tuning. Strictly speaking, RAG is a form of dynamic prompt engineering where developers use a retrieval system to add content to an existing prompt, but RAG systems are used so often that it’s worth discussing them separately. One critical difference with fine-tuning is that you must have access to the model’s weights, information that is usually not available with cloud-based, proprietary LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"The art of mega-prompts spanning multiple written pages and looking like essays has become commonplace for complex tasks when building applications to get things 'just right'. Unfortunately, they bring with them lots of issues: errors, portability, complexity, and more. The GenAI world didn’t plan for mega-prompts. They have simply evolved into what they’ve become today because practitioners kept wanting to do more and more complex things, and their only way to express those intents was with a prompt. But step back and look at some of these prompts [...] Lurking just below the surface are a bunch of classical computing concepts like data, programming instructions, control flows, memory, and stora - all the components typically associated with classical computing elements." (Rob Thomas et al, "AI Value Creators: Beyond the Generative AI User Mindset", 2025)

24 June 2026

🖍️Dianne Cook - Collected Quotes

"A common myth is that non-linear dimension reduction captures non-linear patterns in the high-dimensional data. It may or may not do this. The term means that the methods transform the data non-linearly into a useful (or not) visual representation." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Bias and variance are conceptual constructs. Bias is not possible to quantify unless a true model is known. It is used for setting up simulations and comparing various models, because in these controlled scenarios bias and variance can be computed. In practice, it is not possible to compute. Using high-dimensional visualisation can help with understanding the shape of the class and separation between classes. This provides a better sense about whether a particular approach will be able to capture the shape of the boundary or not, and will thus likely have low or high bias." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Defining an appropriate distance metric from the context ofthe problem is a most important decision. For example, if your variables are all numeric, and on the same scale, then Euclidean distance might be best. If your variables are categorical, you might need to use something like Hamming distance." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Hierarchical clustering is summarised by a dendrogram, which sequentially shows points being joined to form a cluster, with the corresponding distances. Breaking the data into clusters is done by cutting the dendrogram at the long edges. [...] Plotting the dendrogram in the data space can help you understand how the hierarchical clustering has collected the points together into clusters. You can learn if the algorithm has been confused by nuisance patterns in the data, and how different choices of linkage method affect the result." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"High-dimensional data spaces are fascinating places. You may think that there are a lot of ways to plot one or two variables, and a lot of types of patterns that can be found. You might use a density plot and see skewness or a dot plot to find outliers. A scatterplot of two variables might reveal a non-linear relationship or a barrier beyond which no observations exist. We don’t as yet have so many different choices of plot types for high dimensions, but these types of patterns are also what we seek in scatterplots of high-dimensional data. The additional dimensions can clarify these patterns, so that clusters are likely to be more distinct. Observations that did not appear to be very different can be seen to be lonely anomalies in high dimensions, and that no other observations have quite the same combination of values." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"It is important to visualise your data because you might discover things that you could never have anticipated. Although there are many resources available for data visualisation, there are few comprehensive resources on high-dimensional data visualisation. High-dimensional (or multivariate) data arises when many different things are measured for each observation. While we can learn many things from plotting with 1D and 2D or 3D methods there are likely more structures hidden in the higher dimensions." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Non-linear dimension reduction (NLDR) aims to find a single low-dimensional representation of the high-dimensional data that shows the main features of the data. If there are separated clusters present, then it might be a layout where the clusters are all distinct, in a way that a single linear projection could not reveal. For observations falling on a low-dimensional non-linear manifold in high dimensions the NLDR might unfold or unroll it so that they are represented in a plane where the distances are similar to their distance along the manifold." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"PCA (Principal Component Analysis) is very broadly useful for summarising linear association by using combinations of the variables that are highly correlated. However, high correlation can also occur when there are outliers or clustering. PCA is commonly used to detect these patterns also, although this might NOT be a reliable way to do so. To detect clustering or anomalies, using a different approach that is specifically focused on these types of patterns is advisable. To some extent capturing clustering or anomalies using PCA is actually finding problematic patterns that adversely affect conducting appropriate dimension reduction." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"PCA (Principal Component Analysis) is not very effective when the distribution of the variables is highly skewed, so it can be helpful to transform variables to make them more symmetrically distributed before conducting PCA. It is also possible to summarise different types of structure by generalising the optimisation criteria to any function of projected data, f(XA), which is called projection pursuit (PP)." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Unsupervised classification, or cluster analysis, organizes observations into similar groups. Clusteranalysis is a commonly used, appealing, and conceptually intuitive statistical method. Some of its uses include market segmentation, where customers are grouped into clusters with similar attributes for targeted marketing; gene expression analysis, where genes with similar expression patterns are grouped together; and the creation of taxonomies for animals, insects, or plants. Clustering can be used as a way of reducing a massive amount of data because observations within a cluster can be summarised by its centre. Also, clustering effectively subsets the data thus simplifying analysis because observations in each cluster can be analysed separately." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"The way variables are scaled can affect the appearance of dimensionity. If the variables are scaled together, using global values, some variables may have smaller variance than others. Scaling variables individually shifts the focus to association between variables, as the predominant reason for reduced dimension." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"To determine which variables are responsible for the reduced dimension look for the axes that extend out of the point cloud. These contribute to smaller variation in the observations, and thus indicate possible dimension reduction using these variables." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"To understand variance, we need to know how the model fit changes when a different training sample is used to fit the model. This is achieved by dividing the training sample into folds and fitting a model to each fold. This is more difficult to evaluate with visual methods because it would require examining multiple samples for small differences." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"Viewing the dendrograms in high dimensions provides insight into how the algorithm has joined points to clusters. For example, single linkage often has edges leading to a single focal point, which might not yield a useful clustering but might help to 
identify outliers. If the edges point to multiple focal points, with long edges bridging gaps in the data, the result is more likely yielding a useful clustering." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

"When exploring the implicit dimensionality of multivariate data we are looking for projections where the points do not fill the plotting canvas fully. This would indicate that the observed values do not fully populate the high dimensions." (Dianne Cook & Ursula Laa, "Interactively Exploring High-Dimensional Data and Models in R", 2026)

📉Graphical Representation: Function (Just the Quotes)

"The best-known function of charts is for demonstration purposes, to show up facts. When so presented they do not require a trained mind for their appreciation, since the spatial sense through the optic nerve is among the commonest of the human attributes." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Under certain conditions, however, the ordinary form of graphic chart is slightly misleading. It will be conceded that its true function is to portray comparative fluctuations. This result is practically secured when the factors or quantities compared are nearly of the same value or volume, but analysis will show that this is not accomplished when the amounts compared differ greatly in value or volume. [...] The same criticism applies to charts which employ or more scales for various curve. If the different scale are in proper proportion, the result is the same as with one scale, but when two or more scales are used which are not proportional an indication may be given with respect to comparative fluctuations which is absolutely false." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Graphic presentation is a functional form of art as much as modern painting or architectural design. The painter studies his subject to determine what colors and style and design will best express his ideas. The same kind of imagination is exercised by the graphic artist and analyst. In addition, the graphic analyst has some of the same problems as the architect. The modern architect studies the family, its hobbies, interests, ambitions, and financial status, among other things, before he designs the new home. The graphic analyst should make just as thorough a study of the characteristics of the data and file uses for which it is intended before he designs his project. In the same way that the architect must know his materials and how they can best be used both in traditional ways and in new ways of his own devising, so must the graphic analyst be familiar with materials and techniques." (Mary E Spear, "Charting Statistics", 1952)

"A drawing can show a true picture of both the situation as a whole and its separate components at a glance, and do the job better than could figures or the spoken word. In its essence, a chart is a medium of communication conveying a thought, an idea, a situation from one mind to another and not a work of art or a statistical table. The simpler, the more direct it is, the better it will perform that service which is its sole function." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Because ease of use is the purpose, this ratio of function to conceptual complexity is the ultimate test of system design. Neither function alone nor simplicity alone defines a good design. [...] Function, and not simplicity, has always been the measure of excellence for its designers." (Fred P Brooks, "The Mythical Man-Month: Essays", 1975)

"Remember, the primary function of a graph of any kind is to illustrate the relationship between two variables. [...] To draw any graph we must have established some relationship between the two variables. This relationship can be in the form of a formula" (equation is the more mathematical term), as we have just seen, or simply a set of observations, as is common in all types of statistical work. Sometimes we develop set of observations and then try to find an equation that expresses, in mathematical language, the relationship between the two variables." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"Graphic forms help us to perform and influence two critical functions of the mind: the gathering of information and the processing of that information. Graphs and charts are ways to increase the effectiveness and the efficiency of transmitting information in a way that enhances the reader's ability to process that information. Graphics are tools to help give meaning to information because they go beyond the provision of information and show relationships, trends, and comparisons. They help to distinguish which numbers and which ideas are more important than others in a presentation." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Graphs can present internal accounting data effectively. Because one of the main functions of the accountant is to communicate accounting information to users. accountants should use graphs, at least to the extent that they clarify the presentation of accounting data. present the data fairly, and enhance management's ability to make a more informed decision. It has been argued that the human brain can absorb and understand images more easily than words and numbers, and, therefore, graphs may be better communicative devices than written reports or tabular statements." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"In order to be easily understood, a display of information must have a logical structure which is appropriate for the user's knowledge and needs, and this structure must be clearly represented visually. In order to indicate structure, it is necessary to be able to eemphasiz, divide and relate items of information. Visual emphasis can be used to indicate a hierarchical relationship between items of information, as in the case of systems of headings and subheadings for example. Visual separation of items can be used to indicate that they are different in kind or are unrelated functionally, and similarly a visual relationship between items will imply that they are of a similar kind or bear some functional relation to one another. This kind of visual 'coding' helps the reader to appreciate the extent and nature of the relationship between items of information, and to adopt an appropriate scanning strategy." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"The basic principle which should be observed in designing tables is that of grouping related data, either by the use of space or, if necessary, rules. Items which are close together will be seen as being more closely related than items which are farther apart, and the judicious use of space is therefore vitally important. Similarly, ruled lines can be used to relate and divide information, and it is important to be sure which function is required. Rules should not be used to create closed compartments; this is time-wasting and it interferes with scanning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"The practice of framing an illustration with a drawn rectangle is not recommended. This kind of typographic detailing should never be added purely for aesthetic reasons or for decoration. A simple, purely functional drawing will automatically be aesthetically pleasing. Unnecessary lines usually reduce both legibility and attractiveness." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"A coordinate is a number or value used to locate a point with respect to a reference point, line, or plane. Generally the reference is zero. […] The major function of coordinates is to provide a method for encoding information on charts, graphs, and maps in such a way that viewers can accurately decode the information after the graph or map has been generated. " (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008

"Usually, diagrams contain some noise – information unrelated to the diagram’s primary goal. Noise is decorations, redundant, and irrelevant data, unnecessarily emphasized and ambiguous icons, symbols, lines, grids, or labels. Every unnecessary element draws attention away from the central idea that the designer is trying to share. Noise reduces clarity by hiding useful information in a fog of useless data. You may quickly identify noise elements if you can remove them from the diagram or make them less intense and attractive without compromising the function." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"The sizes of charts in space reflect how we convey information to a reader. In a dashboard context, the content, size, and space that the various charts occupy should reflect the form and function of the main message. As you saw with the bento box metaphor from the introduction, there needs to be deliberate thought put into the placement and size of each individual chart so that they all work together in harmony." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Good design serves a more important function than simply pleasing you: It helps you access ideas. It improves your comprehension and makes the ideas more persuasive. Good design makes lesser charts good and good charts transcendent." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

"Graphic design is not just about making things look good. It is a powerful combination of form and function that uses visual elements to communicate a message. Form refers to the physical appearance of a design, such as its shape, color, and typography. Function refers to the purpose of a design, such as what it is trying to communicate or achieve. A good graphic design is both visually appealing and functional. It uses the right combination of form and function to communicate its message effectively. Graphic design is also a strategic and thoughtful craft. It requires careful planning and execution to create a design that is both effective and aesthetically pleasing." (Faith Aderemi, "The Essential Graphic Design Handbook", 2024)


23 June 2026

🖍️James G Scott - Collected Quotes

"A histogram is a great way to depict the distribution of a numerical variable. To construct one, we first partition the range of possible outcomes (here, temperatures) into a set of disjoint intervals ('bins'). Next, we count the number of cases that fall into each bin. Finally, we draw a rectangle over each bin whose height is equal to the count within each bin." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, and will never account for every last detail. A useful mantra here is: all models are wrong, but some models are useful." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] always remember that the construction of an ANOVA table is inherently sequential. For example, first we add the clutter variable, which remains in the model at every subsequent step; then we add the distance variable, which remains in the model at every subsequent step; and so forth. Thus the actual question being answered at each stage of an analysis of variance is: how much variation in the response can this new variable predict, in the context of what has already been predicted by other variables in the model? This point - the importance of context in interpreting an ANOVA table - is subtle, but important." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"An obvious question is: do bootstrapped confidence intervals satisfy the frequentist coverage property? If your sample is fairly representative of the population, then the answer is a qualified yes. That is, the bootstrapping procedure yields nominal X% intervals that cover the true value 'approximately' X% of the time. Moreover, as the size of the original sample gets bigger, the quality of the approximation gets better. Alas, it is necessary to appeal to some very advanced probability theory to put both of these claims on firm footing." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"At the core of the resampling approach to statistical inference lies a simple idea. Most of the time, we can’t feasibly take repeated samples of size n from the population, to see how our estimate changes from one sample to the next. But we can repeatedly take samples of size n from the sample itself, and apply our estimator afresh to each notional sample. The idea is that the variability of the estimates across all these samples can be used to approximate our estimator’s true sampling distribution. This process - pretending that our sample is the whole population, and taking repeated samples of size n with replacement from our original sample of size n - is called bootstrap resampling, or just bootstrapping" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"By themselves, sums of squares are hard to interpret, because they are measured in squared units of the Y variable. But their ratios are highly meaningful. In fact, the ratio of PV to TV - or what fraction of the total variation has been predicted by the model - is one of the most frequently quoted summary measures in all of statistical modeling. This ratio is called the coefficient of determination, and is usually denoted by the symbol R2 [...] The correct interpretation of R2 sometimes trips people up, and is therefore worth repeating: it is the proportion of variance in the data that can be predicted using the statistical model in question." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[boxplots] allow you to assess variability both between and within the groups. [...] Each box shows the within-group variability, as measured by the interquartile range of the numerical variable (SAT score) for all cases in that category. The middle line within each box is the median of that category, and the differences between these medians give you a sense of the between-group variability. In this boxplot, the whiskers extend outside the box no further than 1.5 times the interquartile range. Points outside this interval are shown as individual dots." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Good estimators are those that usually yield estimates close to the truth, with minimal variation. Therefore, we typically summarize a sampling distribution using its standard deviation, which we refer to as the standard error. In quoting the standard error of an estimator’s sampling distribution, you are saying: 'If I were to take repeated samples from the population and use this estimatorfor every sample, my estimate is typically off from the truth by about this much.' Notice again that this is a claim about a procedure, not a particular estimate. The bigger the standard error, the less stable the estimator across different samples, and the less you can trust the estimate for any particular sample." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"In fitting statistical models, we typically equate the trustworthiness of a procedure with its stability under the influence of luck, and we seek to measure the degree to which that procedure might have given a different answer if the forces of randomness had made the world look a bit different. Specifically, the question we seek to answer is: 'if our data set had been different merely due to chance, would our answer have been different, too?'" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Model-building requires much more than just technical knowledge of statistical ideas. It also requires care and judgment, and cannot be reduced to a flowchart, a table of formulas, or a tidy set of numerical summaries that wring every last drop of truth from a data set. There is almost never a single 'right' statistical model for some problem. But there are definitely such things as good models and bad models, and learning to tell the difference is important. Just remember: calling a model good or bad requires knowing both the tool and the task." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] complexity sometimes comes at the expense of explanatory power. We must avoid building models calibrated so perfectly to past experience that they do not generalize to future cases." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"It is common to view a statistical model as nothing more than a recipe for calculating the fitted values, and to think that the residuals are just the errors made by this model. But we’ll have a richer picture if instead we view the residuals as part of the model. If you’ve ignored the variation in the residuals, then you really haven’t specified a complete forecast." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Resampling won’t yield the true sampling distribution of an estimator, but it is often good enough for approximating the standard error (which you’ll remember is just the standard deviation of the sampling distribution). We use the term bootstrapped standard error for the standard deviation of the bootstrapped sampling distribution. The bootstrapped standard error is an estimate of the true standard error." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Tables are almost always the best way to display categorical data sets with few classifying variables, for the simple reason that they convey a lot of information in a small space." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"The residuals from a regression model are sometimes called 'errors'. This is especially true in experimental science, where measurements of some Y variable will be taken at different values of the X variable (called design points), and where noisy measurement instruments can introduce random errors into theobservations. But in many cases this interpretation of a residual as an error can be misleading. A regression model can still give a nonzero residual, even if there is no mistake in the measurement of the Y variable. It’s often far more illuminating to think of the residual as the part of the Y variable that it is left unpredicted by X." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.