Showing posts with label truth. Show all posts
Showing posts with label truth. Show all posts

03 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 127: Understanding Facts in Modern AI)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on what is meant by facts in nowadays AI and include an introduction, respectively a closing statement"

Introduction

As Artificial Intelligence (AI) becomes more deeply integrated into communication, research, and decision‑making, the question of what AI considers a fact has become increasingly important. People often assume that if an AI states something confidently, it must be true. But AI systems do not possess a stable internal database of verified truths. Instead, they generate responses by predicting what information is most likely to follow from patterns in the data they were trained on. Understanding how facts function in modern AI helps clarify why these systems can be powerful tools - and why they sometimes produce errors or fabrications.

What a 'Fact' Means for Humans

For humans, a fact is a statement that can be verified through observation, evidence, or reliable sources. Facts are:

  • Stable: they do not change depending on context.
  • Grounded: they refer to real‑world states or events.
  • Verifiable: they can be checked against evidence.
  • Independent: they exist whether or not someone remembers them.

Human understanding of facts is tied to reasoning, experience, and shared standards of truth.

How AI Models Handle Facts

AI systems do not have beliefs, memories, or understanding. They work by identifying statistical patterns in massive datasets. This leads to a different relationship with facts:

  • Facts are patterns: not stored entries but tendencies in the data.
  • Facts are probabilistic: the model generates what seems likely, not what is verified.
  • Facts are context‑sensitive: the same question phrased differently may yield different answers.
  • Facts are not inherently distinguished from non‑facts: the model does not “know” what is true; it only predicts what fits the pattern.

This is why AI can produce accurate information in one moment and incorrect information in another.

The Fragility of AI Facts

Because AI relies on statistical inference, several factors can distort factual accuracy:

  • Training data limitations: if the data is outdated, incomplete, or biased, the model’s 'facts' reflect those flaws.
  • Ambiguous prompts: unclear questions can lead to confident but incorrect answers.
  • Lack of real‑time grounding: unless connected to external sources, AI cannot update facts after training.
  • Hallucinations: the model may generate plausible‑sounding but false statements when patterns are weak or conflicting.

These issues highlight that AI does not know facts; it reconstructs them.

Why AI Can Still Be Factually Useful

Despite these limitations, AI can be highly effective at working with factual information when used appropriately. Its strengths include:

  • Synthesizing large volumes of data: AI can integrate information from many sources at once.
  • Recognizing factual patterns: it can identify common knowledge across diverse texts.
  • Retrieving structured information: when connected to verified databases or tools, it can provide up‑to‑date facts. 
  • Supporting human fact‑checking: AI can surface relevant details quickly, which humans can then verify.

In this sense, AI acts as a fact assistant, not a fact authority.

The Human Role in Defining Facts for AI

Because AI cannot distinguish truth from falsehood on its own, humans play a crucial role in shaping factual accuracy:

  • Curating training data: selecting high‑quality, diverse, and reliable sources.
  • Building guardrails : designing systems that avoid unsupported claims.
  • Providing feedback: correcting errors to improve future performance.
  • Maintaining oversight: verifying outputs before relying on them for decisions.

AI becomes more reliable when humans treat it as a collaborator rather than an oracle.

Closing Statement

Facts in modern AI are not fixed truths stored inside a machine but statistical echoes of the data used to train it. Understanding this distinction helps set realistic expectations: AI can be a powerful tool for accessing and organizing information, but it cannot replace human judgment, verification, or critical thinking. As AI continues to evolve, the challenge is to build systems that handle facts responsibly - and to ensure that humans remain the final arbiters of truth.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

14 December 2024

🧭💹Business Intelligence: Perspectives (Part 21: Data Visualization Revised)

Data Visualization Series
Data Visualization Series

Creating data visualizations nowadays became so easy that anybody can do it with a minimum of effort and knowledge, which on one side is great for the creators but can be easily become a nightmare for the readers, respectively users. Just dumping data in visuals can be barely called data visualization, even if the result is considered as such. The problems of visualization are multiple – the lack of data culture, the lack of understanding processes, data and their characteristics, the lack of being able to define and model problems, the lack of educating the users, the lack of managing the expectations, etc.

There are many books on data visualization though they seem an expensive commodity for the ones who want rapid enlightenment, and often the illusion of knowing proves maybe to be a barrier. It's also true that many sets of data are so dull, that the lack of information and meaning is compensated by adding elements that give a kitsch look-and-feel (aka chartjunk), shifting the attention from the valuable elements to decorations. So, how do we overcome the various challenges? 

Probably, the most important step when visualizing data is to define the primary purpose of the end product. Is it to inform, to summarize or to navigate the data, to provide different perspectives at macro and micro level, to help discovery, to explore, to sharpen the questions, to make people think, respectively understand, to carry a message, to be artistic or represent truthfully the reality, or maybe is just a filler or point of attraction in a textual content?

Clarifying the initial purpose is important because it makes upfront the motives and expectations explicit, allowing to determine the further requirements, characteristics, and set maybe some limits in what concern the time spent and the qualitative and/or qualitative criteria upon which the end result should be eventually evaluated. Narrowing down such aspects helps in planning and the further steps performed. 

Many of the steps are repetitive and past experience can help reduce the overall effort. Therefore, professionals in the field, driven by intuition and experience probably don't always need to go through the full extent of the process. Conversely, what is learned and done poorly, has high chances of delivering poor quality. 

A visualization can be considered as effective when it serves the intended purpose(s), when it reveals with minimal effort the patterns, issues or facts hidden in the data, when it allows people to explore the data, ask questions and find answers altogether. One can talk also about efficiency, especially when readers can see at a glance the many aspects encoded in the visualization. However, the more the discovery process is dependent on data navigation via filters or other techniques, the more difficult it becomes to talk about efficiency.

Better criteria to judge visualizations is whether they are meaningful and useful for the readers, whether the readers understood the authors' intent, the further intrinsic implication, though multiple characteristics can be associated with these criteria: clarity, specificity, correctedness, truthfulness, appropriateness, simplicity, etc. All these are important in lower or higher degree depending on the broader context of the visualization.

All these must be weighted in the bigger picture when creating visualizations, though there are probably also exceptions, especially on the artistic side, where artists can cut corners for creating an artistic effect, though also in here the authors need to be truthful to the data and make sure that their work don't distort excessively the facts. Failing to do so might not have an important impact on the short term considerably, though in time the effects can ripple with unexpected effects.


20 March 2021

🧭Business Intelligence: New Technologies, Old Challenges (Part II - ETL vs. ELT)

 

Business Intelligence

Data lakes and similar cloud-based repositories drove the requirement of loading the raw data before performing any transformations on the data. At least that’s the approach the new wave of ELT (Extract, Load, Transform) technologies use to handle analytical and data integration workloads, which is probably recommendable for the mentioned cloud-based contexts. However, ELT technologies are especially relevant when is needed to handle data with high velocity, variance, validity or different value of truth (aka big data). This because they allow processing the workloads over architectures that can be scaled with workloads’ demands.

This is probably the most important aspect, even if there can be further advantages, like using built-in connectors to a wide range of sources or implementing complex data flow controls. The ETL (Extract, Transform, Load) tools have the same capabilities, maybe reduced to certain data sources, though their newer versions seem to bridge the gap.

One of the most stressed advantages of ELT is the possibility of having all the (business) data in the repository, though these are not technological advantages. The same can be obtained via ETL tools, even if this might involve upon case a bigger effort, effort depending on the functionality existing in each tool. It’s true that ETL solutions have a narrower scope by loading a subset of the available data, or that transformations are made before loading the data, though this depends on the scope considered while building the data warehouse or data mart, respectively the design of ETL packages, and both are a matter of choice, choices that can be traced back to business requirements or technical best practices.

Some of the advantages seen are context-dependent – the context in which the technologies are put, respectively the problems are solved. It is often imputed to ETL solutions that the available data are already prepared (aggregated, converted) and new requirements will drive additional effort. On the other side, in ELT-based solutions all the data are made available and eventually further transformed, but also here the level of transformations made depends on specific requirements. Independently of the approach used, the data are still available if needed, respectively involve certain effort for further processing.

Building usable and reliable data models is dependent on good design, and in the design process reside the most important challenges. In theory, some think that in ETL scenarios the design is done beforehand though that’s not necessarily true. One can pull the raw data from the source and build the data models in the target repositories.

Data conversion and cleaning is needed under both approaches. In some scenarios is ideal to do this upfront, minimizing the effect these processes have on data’s usage, while in other scenarios it’s helpful to address them later in the process, with the risk that each project will address them differently. This can become an issue and should be ideally addressed by design (e.g. by building an intermediate layer) or at least organizationally (e.g. enforcing best practices).

Advancing that ELT is better just because the data are true (being in raw form) can be taken only as a marketing slogan. The degree of truth data has depends on the way data reflects business’ processes and the way data are maintained, while their quality is judged entirely on their intended use. Even if raw data allow more flexibility in handling the various requests, the challenges involved in processing can be neglected only under the consequences that follow from this.

Looking at the analytics and data integration cloud-based technologies, they seem to allow both approaches, thus building optimal solutions relying on professionals’ wisdom of making appropriate choices.

Previous Post <<||>>Next Post

02 December 2018

🔭Data Science: All Molels Are Wrong (Just the Quotes)

“[…] no models are [true] = not even the Newtonian laws. When you construct a model you leave out all the details which you, with the knowledge at your disposal, consider inessential. […] Models should not be true, but it is important that they are applicable, and whether they are applicable for any given purpose must of course be investigated. This also means that a model is never accepted finally, only on trial.” (Georg Rasch, “Probabilistic Models for Some Intelligence and Attainment Tests”, 1960)

“Celestial navigation is based on the premise that the Earth is the center of the universe. The premise is wrong, but the navigation works. An incorrect model can be a useful tool.” (R A J Phillips, “A Day in the Life of Kelvin Throop”, Analog Science Fiction and Science Fact, Vol. 73 No. 5, 1964)

“Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.” (George Box, “Science and Statistics", Journal of the American Statistical Association 71, 1976)

“A model of the universe does not require faith, but a telescope. If it is wrong, it is wrong.” (Paul C W Davies, “Space and Time in the Modern Universe”, 1977)

"Competent scientists do not believe their own models or theories, but rather treat them as convenient fictions. […] The issue to a scientist is not whether a model is true, but rather whether there is another whose predictive power is enough better to justify movement from today's fiction to a new one." (Steve Vardeman," Comment", Journal of the American Statistical Association 82, 1987)

“The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful.” (George Box, 1987)

"Statistical models for data are never true. The question whether a model is true is irrelevant. A more appropriate question is whether we obtain the correct scientific conclusion if we pretend that the process under study behaves according to a particular statistical model." (Scott Zeger, "Statistical reasoning in epidemiology", American Journal of Epidemiology, 1991)

“[…] it does not seem helpful just to say that all models are wrong. The very word model implies simplification and idealization. The idea that complex physical, biological or sociological systems can be exactly described by a few formulae is patently absurd. The construction of idealized representations that capture important stable aspects of such systems is, however, a vital part of general scientific analysis and statistical models, especially substantive ones, do not seem essentially different from other kinds of model.” (Sir David Cox, "Comment on ‘Model uncertainty, data mining and statistical inference’", Journal of the Royal Statistical Society, Series A 158, 1995)

“I do not know that my view is more correct; I do not even think that ‘right’ and ‘wrong’ are good categories for assessing complex mental models of external reality - for models in science are judged [as] useful or detrimental, not as true or false.” (Stephen Jay Gould, “Dinosaur in a Haystack: Reflections in Natural History”, 1995)

“No matter how beautiful the whole model may be, no matter how naturally it all seems to hang together now, if it disagrees with experiment, then it is wrong.” (John Gribbin, “Almost Everyone’s Guide to Science”, 1999)

“A model is a simplification or approximation of reality and hence will not reflect all of reality. […] Box noted that ‘all models are wrong, but some are useful’. While a model can never be ‘truth’, a model might be ranked from very useful, to useful, to somewhat useful to, finally, essentially useless.” (Kenneth P Burnham & David R Anderson, “Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach” 2nd Ed., 2005)

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"First, we affirm that all models are wrong, some of them are useful. Since a model is an abstraction of reality, and that too only from a particular perspective, they are fundamentally wrong because they are not reality. That gives no license to models that are wrongly built - after all, two wrongs don’t make a right. So usefulness, or purpose, is what determines a model’s role, given that it is correctly formed. Models therefore have teleological value even though they are ontologically erroneous." (John Boardman & Brian Sauser, "Systems Thinking: Coping with 21st Century Problems", 2008)

“In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality - a point well made by George Box in his oft-cited remark that “all models are wrong, but some are useful”. (David Hand, "Wonderful examples, but let's not close our eyes", Statistical Science 29, 2014)

"A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, and will never account for every last detail. A useful mantra here is: all models are wrong, but some models are useful." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.