Showing posts with label certainty. Show all posts
Showing posts with label certainty. Show all posts

08 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 130: What Measurement Means in Today’s AI Through the DIKW Pyramid)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on what is meant by measurement in nowadays AI considered from the perspective of DIKW pyramid and include an introduction, respectively a closing statement"

Introduction

In the rapidly evolving world of Artificial Intelligence, 'measurement' has become a central concept - yet it is often misunderstood. We measure model performance, data quality, user behavior, risks, and outcomes. But measurement in AI is not just about numbers or metrics; it is about understanding how information transforms into meaningful action. One of the most useful frameworks for exploring this transformation is the DIKW pyramid, which describes the progression from Data to Information, Knowledge, and ultimately Wisdom. By examining measurement through this lens, we gain a deeper appreciation of how AI systems interpret the world and how we evaluate their effectiveness, reliability, and impact.

Measurement at the Data Level

At the base of the DIKW pyramid lies data - raw, unprocessed facts. In AI, measurement at this level focuses on quantity, diversity, and quality. We measure:

  • The size of datasets
  • The distribution of samples
  • The presence of noise or bias
  • The completeness and representativeness of inputs

These measurements matter because data is the foundation of all AI learning. Poorly measured or poorly curated data leads to unreliable models. At this stage, measurement is largely descriptive: it tells us what is there, not what it means.

Measurement at the Information Level

When data is organized and contextualized, it becomes information. Measurement here shifts from raw counts to meaningful patterns. For AI, this includes:

  • Accuracy, precision, recall, and other performance metrics
  • Error rates across different demographic or contextual groups
  • Model confidence scores
  • Feature importance or contribution

This is where measurement begins to answer questions rather than simply describe inputs. It helps us understand how the model behaves and where it succeeds or fails. Information-level measurement is essential for transparency and accountability.

Measurement at the Knowledge Level

Knowledge emerges when information is interpreted and connected to real-world meaning. Measurement at this level becomes more complex and more human-centered. It includes:

  • How well the AI generalizes to new situations
  • Whether the model’s outputs align with domain expertise
  • The stability and robustness of predictions
  • The system’s ability to adapt to changing conditions

Here, measurement is no longer just technical - it becomes contextual. We evaluate whether the AI system is useful, reliable, and appropriate for the environment in which it operates. Knowledge-level measurement bridges the gap between technical performance and practical value.

Measurement at the Wisdom Level

At the top of the DIKW pyramid lies wisdom, which involves judgment, ethics, and long-term thinking. Measurement at this level is the most challenging because it touches on human values. It includes:

  • Whether the AI supports responsible decision-making
  • Whether it avoids harm and respects boundaries
  • Whether it encourages human oversight when needed
  • Whether its behavior aligns with societal norms and expectations

Wisdom-level measurement is not about numbers - it is about evaluating the quality of the AI’s role in human life. It asks whether the system contributes positively to well-being, fairness, and trust.

Why the DIKW Perspective Matters

Viewing measurement through the DIKW pyramid highlights a crucial insight: not all measurements are equal. Technical metrics alone cannot capture the full impact of AI. As systems become more integrated into society, we must measure not only how they perform, but how they behave, how they influence decisions, and how they align with human values.

This layered approach also reminds us that measurement becomes more interpretive and more ethical as we move up the pyramid. It shifts from counting data points to evaluating consequences.

Closing Statement

Measurement in today’s AI is far more than a technical exercise - it is a multidimensional process that spans the entire DIKW pyramid. From raw data to informed judgment, each layer requires its own methods, its own questions, and its own standards. By understanding measurement through this framework, we gain a clearer view of how AI systems learn, how they operate, and how they shape human experience. Ultimately, meaningful measurement is what allows AI to evolve responsibly, supporting not just intelligent systems but wise and trustworthy ones.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

07 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 129: Understanding Data in Modern AI)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on what is meant by data in nowadays AI and include an introduction, respectively a closing statement"

Introduction

In the age of Artificial Intelligence (AI), data has become one of the most frequently used - and frequently misunderstood - terms. We hear that 'data is the new oil',”that AI systems are 'trained on massive datasets', or that 'better data means better models'. But what does data actually mean in the context of modern AI? And why has it become such a central concept in shaping the capabilities, limitations, and responsibilities of today’s intelligent systems? Understanding the role of data is essential for anyone who wants to grasp how AI works, how it learns, and how it influences the world around us.

Data as the Foundation of AI

At its core, data is the raw material that fuels AI. It includes text, images, audio, video, sensor readings, logs, and countless other forms of digital information. Modern AI systems - especially those based on machine learning - do not learn through explicit instructions. Instead, they detect patterns in large volumes of data and use those patterns to generate predictions, classifications, or responses.

In this sense, data is not just input; it is experience. The breadth, diversity, and quality of the data shape the system’s understanding of the world. A model trained on narrow or biased data will reflect those limitations. A model trained on broad, diverse, and well‑curated data will be more capable, more robust, and more aligned with real‑world complexity.

The Many Forms of Data in Today’s AI

1. Training Data

Training data is the information used to teach AI systems how to perform tasks. For language models, this includes text from books, articles, websites, and other publicly available sources. For image models, it includes labeled pictures. Training data determines what the model can recognize, how well it generalizes, and where it might struggle.

2. Evaluation Data

Evaluation data is used to test how well an AI system performs. It helps developers measure accuracy, fairness, safety, and reliability. Good evaluation data is diverse and representative, ensuring that the model is tested on a wide range of scenarios.

3. Real‑Time or Operational Data

Some AI systems use real‑time data to adapt to changing conditions - for example, navigation apps that adjust routes based on traffic patterns. This type of data helps AI remain relevant and responsive.

4. Metadata and Contextual Data

Metadata - information about data - plays a growing role in AI. It includes timestamps, geolocation, device type, or other contextual clues that help systems interpret meaning more accurately.

Why Data Quality Matters

In modern AI, the quality of data often matters more than the quantity. High‑quality data is:

  • Accurate
  • Representative
  • Diverse
  • Ethically sourced
  • Free from harmful biases

Poor‑quality data can lead to unreliable outputs, unfair outcomes, or unsafe behavior. This is why responsible data curation has become a central part of AI development.

Critical Aspects of Data in Today’s AI

Bias and Fairness

Data reflects the world - and the world contains inequalities. If these patterns are not addressed, AI systems can unintentionally reinforce them. Ensuring fairness requires careful analysis, diverse datasets, and ongoing monitoring.

Privacy and Consent

Modern AI must respect privacy. This means handling personal information responsibly, minimizing data collection, and ensuring that data use aligns with legal and ethical standards.

Transparency

Users increasingly expect clarity about how data is used, where it comes from, and how it influences AI behavior. Transparency builds trust and helps people understand the system’s boundaries.

Security

Data must be protected from misuse, unauthorized access, and manipulation. Strong security practices are essential for maintaining integrity and trust.

Closing Statement

In today’s AI landscape, data is far more than digital information - it is the foundation upon which intelligent systems are built. Understanding what data means, how it is used, and why its quality matters is essential for navigating the opportunities and challenges of modern AI. As these systems continue to evolve, the way we collect, curate, and protect data will shape not only the capabilities of AI but also its impact on society. By treating data with care, responsibility, and respect, we ensure that AI develops in ways that are trustworthy, fair, and beneficial for everyone.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

🤖〽️Prompt Engineering: Copilot Unabridged (Part 128: Understanding Wisdom in Modern AI)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on what is meant by wisdom in nowadays AI and include an introduction, respectively a closing statement"

Introduction

In an era where Artificial Intelligence (AI) is woven into nearly every aspect of modern life - from search engines and medical diagnostics to creative tools and autonomous systems - the question of wisdom has become more than philosophical curiosity. It has become a practical necessity. We often talk about AI in terms of intelligence, speed, accuracy, or scale. But intelligence alone doesn’t guarantee good judgment. As AI systems grow more capable, the idea of 'wisdom' is emerging as a crucial benchmark for how these systems should behave, support human decision-making, and integrate into society responsibly. Understanding what wisdom means in the context of AI helps us navigate not only what these systems can do, but what they should do.

The Shift from Intelligence to Wisdom

Traditionally, AI has been evaluated by its ability to process information, recognize patterns, and generate outputs that mimic human reasoning. These are impressive feats, but they represent only a narrow slice of what humans consider wise behavior. Wisdom involves context, restraint, empathy, and long-term thinking - qualities that go beyond raw computational power.

In today’s AI, wisdom is less about knowing everything and more about knowing how to act in ways that align with human values. This includes understanding nuance, recognizing uncertainty, and avoiding overconfidence. A wise AI doesn’t just provide answers; it helps people think more clearly, make informed choices, and avoid harm.

Context Awareness and Judgment

One of the most important aspects of wisdom in AI is contextual sensitivity. Human wisdom depends heavily on understanding the situation, the people involved, and the potential consequences. Modern AI systems are beginning to approximate this by analyzing patterns across vast amounts of data, but true wisdom requires more than correlation.

A wise AI system recognizes when a question touches on sensitive areas - such as health, safety, or emotional well‑being - and adjusts its behavior accordingly. It knows when to provide information, when to encourage human involvement, and when to step back. This kind of judgment is essential for building trust and ensuring that AI supports, rather than replaces, human agency.

Ethical Alignment and Responsibility

Wisdom also implies a moral dimension. In the context of AI, this means aligning system behavior with ethical principles such as fairness, transparency, and respect for human dignity. A wise AI system avoids amplifying harmful biases, respects privacy, and is designed to prevent misuse.

This doesn’t mean AI becomes a moral authority. Instead, it means AI is built to operate within guardrails that reflect widely shared human values. Wisdom in AI is about responsibility - ensuring that powerful tools are used in ways that benefit people and minimize harm.

Humility and Uncertainty

Perhaps surprisingly, one of the most important traits of wisdom is humility. A wise person knows the limits of their knowledge, and the same should be true for AI. Today’s most advanced systems incorporate mechanisms to acknowledge uncertainty, avoid speculation, and encourage users to seek human expertise when appropriate.

This humility is not a weakness; it is a safeguard. It prevents AI from presenting guesses as facts and helps maintain a healthy balance between automation and human judgment.

Closing Statement

As AI becomes more deeply embedded in society, the concept of wisdom offers a powerful lens for understanding what we truly need from these systems. Intelligence alone is no longer enough. We need AI that exercises judgment, respects human values, and supports thoughtful decision‑making. Wisdom in AI is ultimately about partnership - creating systems that enhance human capability while honoring the complexity of human life. By striving for wisdom, not just intelligence, we shape a future where AI contributes to a more thoughtful, responsible, and humane world.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post 

24 September 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 100: Why AI Is an Investment Bubble)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on why AI is an investment bubble"

Introduction

Artificial Intelligence (AI) is the buzzword of the decade. From chatbots to autonomous vehicles, investors are pouring billions into AI startups, infrastructure, and research. Stock prices of AI-adjacent companies have soared, venture capital is flowing freely, and headlines proclaim a technological revolution. But beneath the hype lies a growing concern: AI may be the next big investment bubble.

Much like the dot-com boom of the late 1990s, the AI frenzy is driven more by speculation than substance. Here’s why the current wave of AI investment may be inflating a bubble that’s bound to burst.

1. Valuations Detached from Reality

Many AI startups are being valued at billions despite having little to no revenue, unproven business models, or products still in development. Investors are betting on potential rather than performance. This speculative behavior mirrors past bubbles - where companies were funded not for what they had built, but for what they promised to build.

In some cases, companies with minimal AI capabilities are rebranding themselves as 'AI-powered' to attract funding. The term 'AI' has become a magnet for capital, regardless of technical depth or market viability.

2. Overpromising, Underdelivering

AI is powerful - but it’s not magic. Many investors and executives misunderstand its limitations. They expect general intelligence, flawless automation, and instant productivity gains. In reality, most AI systems are narrow, brittle, and require massive data and compute resources to function.

The gap between expectation and reality is widening. When AI fails to deliver on inflated promises -whether in healthcare, finance, or customer service - disillusionment sets in. This pattern of hype followed by disappointment is a classic bubble indicator.

3. Unsustainable Infrastructure Costs

Training large AI models requires enormous computational power, energy, and water. The cost of maintaining data centers and GPUs is skyrocketing. While tech giants can absorb these expenses, smaller players cannot. Many startups are burning through cash just to keep their models running.

This creates a fragile ecosystem where profitability is elusive. If funding dries up or operational costs spike, many AI ventures may collapse under their own weight.

4. Herd Mentality and Fear of Missing Out (FOMO) 

The AI gold rush is fueled by fear of missing out. Investors don’t want to be left behind, so they chase deals without due diligence. This herd mentality inflates valuations and distorts market signals.

When too much capital chases too few viable ideas, bubbles form. Eventually, reality catches up - and the correction can be brutal.

5. Regulatory Uncertainty

Governments around the world are scrambling to regulate AI. From data privacy to algorithmic bias, the legal landscape is shifting. New rules could restrict deployment, increase compliance costs, or ban certain applications altogether.

This regulatory uncertainty adds risk to AI investments. Companies built on shaky ethical or legal ground may find themselves suddenly nonviable.

Conclusion: Innovation vs. Inflation

AI is not a scam - it’s a transformative technology with real potential. But the current investment climate is inflating expectations beyond what the technology can sustainably deliver. Like all bubbles, it’s driven by emotion, speculation, and a distorted sense of urgency.

When the dust settles, many overhyped ventures will vanish. But the survivors - those with real value, ethical foundations, and sustainable models - will shape the future of AI.

Investors must distinguish between innovation and inflation. Because in the world of AI, not every 'intelligent' bet is a smart one.

Just try the prompt on Copilot or your favorite AI-powered assistant! Have you got a different/similar result? How big or important is the difference? Any other thoughts?
Just share the link to the post with me and I'll add it to this post as a resource!

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

29 August 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 82: Predicting the Future of AI - With AI)

Prompt Engineering Series

Prompt: "write a post of 600 words on how AI can be used to predict its evolution"

Introduction

Artificial Intelligence (AI) is evolving at a breathtaking pace, reshaping industries, societies, and even our understanding of intelligence itself. But as AI becomes more complex and autonomous, a fascinating question arises: Can AI be used to predict its own evolution?

The answer is yes - and it’s already happening. By analyzing trends, modeling innovation cycles, and simulating future scenarios, AI is becoming a powerful tool not just for solving problems, but for forecasting its own trajectory.

Learning from the Past to Predict the Future

AI systems excel at pattern recognition. By ingesting historical data on technological breakthroughs, research publications, patent filings, and funding flows, AI can identify the signals that precede major leaps in capability.

For example:

  • Natural language models can analyze scientific literature to detect emerging themes in AI research.
  • Machine learning algorithms can forecast the rate of improvement in benchmarks like image recognition, language translation, or autonomous navigation.
  • Knowledge graphs can map relationships between technologies, institutions, and innovations to anticipate convergence points.

This isn’t just speculation - it’s data-driven foresight.

Modeling Innovation Cycles

AI can also be used to model the dynamics of innovation itself. Techniques like system dynamics, agent-based modeling, and evolutionary algorithms allow researchers to simulate how ideas spread, how technologies mature, and how breakthroughs emerge.

These models can incorporate variables such as:

  • Research funding and policy shifts
  • Talent migration across institutions
  • Hardware and compute availability
  • Public sentiment and ethical debates

By adjusting these inputs, AI can generate plausible futures - scenarios that help policymakers, technologists, and ethicists prepare for what’s next.

Predicting Capability Growth

One of the most direct applications is forecasting the growth of AI capabilities. For instance:

  • Performance extrapolation: AI can analyze past improvements in model accuracy, speed, and generalization to estimate future milestones.
  • Architecture simulation: Generative models can propose new neural network designs and predict their theoretical performance.
  • Meta-learning: AI systems can learn how to learn better, accelerating their own development and hinting at the pace of future evolution.

This recursive forecasting - AI predicting AI - is a hallmark of the field’s increasing sophistication.

Challenges and Uncertainties

Despite its promise, predicting AI’s evolution is fraught with uncertainty:

  • Emergent behavior: AI systems often behave in unexpected ways, making long-term predictions difficult.
  • Black box models: Many AI systems lack interpretability, which limits our ability to understand their reasoning.
  • Ethical and societal variables: Human decisions - about regulation, deployment, and trust - can dramatically alter AI’s path.

AI can model probabilities, but it cannot guarantee outcomes. Its predictions must be treated as guidance, not gospel.

Governance and Strategic Planning

Using AI to predict its own evolution isn’t just a technical exercise - it’s a governance imperative. Governments, corporations, and research institutions can use these forecasts to:

  • Anticipate risks and disruptions
  • Allocate resources strategically
  • Design ethical frameworks before crises emerge
  • Foster international cooperation on AI safety

By turning prediction into preparation, we shift from reactive to proactive stewardship.

Conclusion: AI as a Mirror and a Map

AI’s ability to predict its own evolution is a profound development. It allows us to peer into the future - not with certainty, but with clarity. It offers a mirror to reflect on our trajectory, and a map to navigate the unknown.

But the ultimate responsibility remains human. AI can forecast its growth, but only we can decide what kind of future we want to build.

Just try the prompt on Copilot or your favorite AI-powered assistant! Have you got a different/similar result? How big or important is the difference? Any other thoughts?
Just share the link to the post with me and I'll add it to this post as a resource!

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

27 August 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 80: The Countdown to Superintelligence: How Soon Will AI Surpass Humans?)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on how long it will take AI to surpass human given the exponential growth of AI from the past years"

Introduction

Artificial Intelligence (AI) has evolved from a niche academic pursuit to a transformative force reshaping industries, societies, and even our understanding of intelligence itself. With breakthroughs in deep learning, natural language processing, and autonomous systems, the question is no longer if AI will surpass human intelligence - but when.

Given the exponential trajectory of AI development, estimating the timeline for AI to outsmart humans is both urgent and elusive. Let’s explore the factors driving this acceleration and the challenges in predicting its tipping point.

The Exponential Curve of AI Progress

AI’s growth isn’t linear - it’s exponential. Consider the following milestones:

  • 2012: Deep learning revolutionized image recognition with AlexNet.
  • 2016: AlphaGo defeated world champion Lee Sedol in Go, a game once thought too complex for machines.
  • 2020s: Large language models like GPT and multimodal systems began generating human-like text, images, and even code.

Each leap builds on the last, compressing decades of progress into years. Moore’s Law may be slowing in hardware, but AI’s software capabilities are accelerating through better algorithms, larger datasets, and more efficient architectures.

Defining 'Surpassing Humans'

To estimate when AI will surpass humans, we must define what 'surpass' means:

  • Narrow Intelligence: AI already outperforms humans in specific domains - chess, protein folding, fraud detection.
  • General Intelligence: The ability to reason, learn, and adapt across diverse tasks. This is the holy grail - Artificial General Intelligence (AGI).
  • Superintelligence: Intelligence far beyond human capacity, capable of strategic planning, creativity, and self-improvement.

Most experts agree that AI has already surpassed humans in narrow tasks. AGI is the next frontier - and the most debated.

Predictions from the Field

Surveys of AI researchers reveal a wide range of predictions:

  • A 2022 survey by Metaculus estimated a 50% chance of AGI by 2040.
  • Some optimists, like Ray Kurzweil, predict human-level AI by 2029.
  • Others, like Yann LeCun, argue that AGI is still decades away due to the complexity of human cognition.

The divergence stems from uncertainty in how intelligence scales, how much data is enough, and whether current architectures can generalize.

Barriers to Surpassing Humans

Despite rapid progress, several challenges remain:

  • Common Sense Reasoning: AI struggles with context, nuance, and ambiguity.
  • Embodiment: Human intelligence is shaped by physical experience - something machines lack.
  • Ethics and Alignment: Ensuring AI goals align with human values is a major hurdle.
  • Interpretability: We often don’t understand how AI systems arrive at decisions, making trust and control difficult.

These barriers may delay the timeline - but they also highlight the need for caution.

The Singularity Question

The concept of the 'technological singularity' - a point where AI self-improves beyond human control - adds urgency to the timeline. If AI reaches a threshold where it can redesign itself, progress could become uncontrollable and unpredictable.

Some theorists suggest this could happen within decades. Others argue it’s speculative and depends on breakthroughs we haven’t yet imagined.

Conclusion: Prepare for Acceleration, Not Certainty

While we can’t pinpoint the exact year AI will surpass humans, the trajectory is clear: exponential growth, increasing autonomy, and expanding capabilities. Whether it’s 2030, 2040, or beyond, the time to prepare is now.

Surpassing human intelligence isn’t just a technological milestone - it’s a societal turning point. It demands foresight, governance, and humility. Because once AI crosses that threshold, the future won’t be shaped by how smart machines are - but by how wisely we’ve guided them.

Just try the prompt on Copilot or your favorite AI-powered assistant! Have you got a different/similar result? How big or important is the difference? Any other thoughts?
Just share the link to the post with me and I'll add it to this post as a resource!

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

21 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 28: The Gray Areas of AI - Navigating Ethical and Practical Uncertainty)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a blogpost of 600 words on the gray areas of AI"

Introduction

Artificial Intelligence (AI) is transforming industries, automating processes, and reshaping human interactions. While AI offers immense benefits, it also presents gray areas - unclear ethical, legal, and societal dilemmas that challenge our understanding of responsible AI development. These uncertainties raise questions about bias, accountability, transparency, and the role of AI in decision-making.

1. AI and Bias: The Challenge of Fairness

One of the most debated gray areas in AI is bias in algorithms. AI models learn from historical data, but if that data contains racial, gender, or socioeconomic biases, AI can reinforce discrimination rather than eliminate it.

For example, AI-powered hiring systems have been found to favor certain demographics based on biased training data. Similarly, facial recognition technology has lower accuracy rates for people with darker skin tones, leading to misidentifications.

While AI developers strive to reduce bias, achieving complete fairness remains an ongoing challenge.

2. AI and Accountability: Who Is Responsible?

AI-driven decisions impact finance, healthcare, law enforcement, and hiring, but when AI makes mistakes, who is accountable?

  • If an AI-powered medical diagnosis tool misidentifies a disease, is the hospital, developer, or AI itself responsible?
  • If an autonomous vehicle causes an accident, should the manufacturer or AI system be held liable?
  • If AI-driven financial algorithms trigger market instability, who takes responsibility?

The lack of clear accountability creates legal and ethical uncertainty, making AI governance a complex issue.

3. AI and Transparency: The "Black Box" Problem

Many AI models operate as black boxes, meaning their decision-making processes are opaque and difficult to interpret. This raises concerns about:

  • Trust in AI-driven decisions, especially in high-stakes fields like healthcare and criminal justice.
  • Lack of explainability, making it hard for users to understand AI-generated outcomes.
  • Potential manipulation, where AI systems could be exploited without clear oversight.

Developers are working on explainable AI (XAI) to improve transparency, but full interpretability remains a challenge.

4. AI and Employment: Automation vs. Human Labor

AI-driven automation is replacing jobs in manufacturing, customer service, finance, and transportation. While AI creates new opportunities, it also raises concerns about:

  • Mass unemployment, as AI replaces human workers.
  • Economic inequality, where AI benefits corporations but disadvantages low-income workers.
  • The need for reskilling, requiring workers to adapt to AI-driven industries.

Balancing AI automation with human labor sustainability is a critical challenge for policymakers and businesses.

5. AI and Ethics: The Moral Dilemmas of AI Decision-Making

AI lacks human intuition, morality, and ethical reasoning, yet it is increasingly used in legal judgments, hiring decisions, and medical diagnoses. Ethical concerns include:

  • Should AI make life-or-death decisions in healthcare?
  • Can AI-driven surveillance infringe on privacy rights?
  • Should AI-generated content be regulated to prevent misinformation?

Without clear ethical guidelines, AI’s role in decision-making and governance remains a gray area.

Conclusion: AI’s Future Requires Ethical Oversight

AI’s gray areas - bias, accountability, transparency, employment impact, and ethical dilemmas - highlight the need for responsible AI development and regulation. Governments, businesses, and researchers must collaborate to define ethical AI standards, ensuring AI serves humanity without unintended harm.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

09 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part IV: Making It in the Statistics)

Business Intelligence
Business Intelligence Series

Various sources (e.g., [1], [2], [3]) advance the failure rates for data projects somewhere between 70% and 85%, rates which are a bit higher than the failure of standard projects estimated at 60-75% but not by much. This means that only 2-3 out of 10 projects will succeed and that’s another reason to plan for failure, respectively embrace the failure

Unfortunately, the statistics advanced on project failure have no solid fundament and should be regarded with circumspection as long the methodology and information about the population used for the estimates aren’t shared, though they do reflect an important point – many data projects do fail! It would be foolish to think that your project will not fail just because you’re a big company, and you have the best resources, and you have a proven rate of success, and you took all the precautions for the project not to fail.

Usually at the end of a project the team meets together to document the lessons learned in the hope that the next projects will benefit from them. The team did learn something, though as the practice shows even if the team managed to avoid some issues, other issues will impact the next similar project, leading to similar variances. One can summarize this as "on the average the impact of new issues and avoided known issues tends to zero out" or "on average, the plusses and minuses balance each other across projects". It’s probably a question of focus – if organizations focus too much on certain aspects, other aspects are ignored and/or unseen. 

So, your first data project will more likely fail. The question is: what do you do about it? It’s important to be aware of why projects and data projects fail, though starting to consider and monitor each possible issue can prove to be ineffective. One can, however, create a risk register from the list and estimate the rates for each of the potential failures, respectively focus on only the top 3-5 which have the highest risk. Of course, one should reevaluate the estimates on a regular basis though that’s Risk Management 101. 

Besides this, one should focus on how the team can make the project succeed. When adopting a technology, methodology or set of processes, it’s recommended to start with a proof-of-concept (PoC). To make the PoC a helpful experience it’s probably important to start with a topic that’s not too big to handle, but that also involves some complexity that would allow the organization to evaluate the targeted set of tools and technologies. It can also be a topic for which other organizations have made important progress, respectively succeed. The temptation is big to approach the most stringent issues in the organization, respectively to build something big that can have an enormous impact for the organization. Jumping too soon into such topics can just increase the chances of failure. 

One can also formulate the goals, objectives and further requirements in a form that allows the organization to build upon them even if the project fails. A PoC is about learning, building a foundation, doing the groundwork, exploring, mapping the unknown, and identifying what's still missing to make progress, respectively closing the full circle. A PoC is less about overachievement and a big impact, which can happen, though is a consequence of the good work done in the PoC. 

The bottom line, no matter whether you succeed or fail, once you start a project, you’ll still make it in the statistics! More important is what you’ve learnt after the first data project, respectively how you can use the respective knowledge in further projects to make a difference!

Previous Post <<||>> Next Post

References:
[1] Harvard Business Review (2023) Keep Your AI Projects on Track, by Iavor Bojinov (link)
[2] Cognilytica (2023) The Shocking Truth: 70-80% of AI Projects Fail! (link)
[3] VentureBeat (2019) Why do 87% of data science projects never make it into production? (link)

08 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part III: Failure through the Looking Glass)

Business Intelligence
Business Intelligence Series

There’s a huge volume of material available on project failure – resources that document why individual projects failed, why in general projects fail, why project members, managers and/or executives think projects fail, and there seems to be no other more rewarding activity at the end of a project than to theorize why a project failed, the topic culminating occasionally with the blaming game. Success may generate applause, though it's failure that attracts and stirs the most waves (irony, disapproval, and other similar behavior) and everybody seems to be an expert after the consumed endeavor. 

The mere definition of a project failure – not fulfilling project’s objectives within the set budget and timeframe - is a misnomer because budgets and timelines are estimated based on the information available at the beginning of the project, the amount of uncertainty for many projects being considerable, and data projects are no exceptions from it. The higher the uncertainty the less probable are the two estimates. Even simple projects can reveal uncertainty especially when the broader context of the projects is considered. 

Even if it’s not a common practice, one way to cope with uncertainty is to add a tolerance for the estimates, though even this practice probably will not always accommodate the full extent of the unknown as the tolerances are usually small. The general expectation is to have an accurate and precise landing, which for big or exploratory projects is seldom possible!

Moreover, the assumptions under which the estimates hold are easily invalidated in praxis – resources’ availability, first time right, executive’s support to set priorities, requirements’ quality, technologies’ maturity, etc. If one looks beyond the reasons why projects fail in general, quite often the issues are more organizational than technological, the lack of knowledge and experience being some of the factors. 

Conversely, many projects will not get approved if the estimates don’t look positive, and therefore people are pressured in one way or another to make the numbers fit the expectations. Some projects, given their importance, need to be done even if the numbers don’t look good or can’t be quantified correctly. Other projects represent people’s subsistence on the job, respectively people's self-occupation to create motion, though they can occasionally have also a positive impact for the organizations. These kinds of aspects almost never make it in statistics or surveys. Neither do the big issues people are afraid to talk about. Where to consider that in the light of politics and office’s grapevine the facts get distorted!

Data projects reflect all the symptoms of failure projects have in general, though when words like AI, Statistics or Machine Learning are used, the chances for failure are even higher given that the respective fields require a higher level of expertise, the appropriate use of technologies and adherence to the scientific process for the results to be valid. If projects can benefit from general recipes, respectively established procedures and methods, their range of applicability decreases when the mentioned areas are involved. 

Many data projects have an exploratory nature – seeing what’s possible - and therefore a considerable percentage will not reach production. Moreover, even those that reach that far might arrive to be stopped or discarded sooner or later if they don’t deliver the expected value, and probably many of the models created in the process are biased, irrelevant, or incorrectly apply the theory. Where to add that the mere use of tools and algorithms is not Data Science or Data Analysis. 

The challenge for many data projects is to identify which Project Management (PM) best practices to consider. Following all or no practices at all just increases the risks of failure!

Previous Post <<||>> Next Post

06 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part II: There's Value in Failure)

Business Intelligence
Business Intelligence Series

"Results are nothing; the energies which produce them
and which again spring from them are everything."
(Wilhelm von Humboldt,  "On Language", 1836)

When the data is not available and is needed on a continuous basis then usually the solution is to redesign the processes and make sure the data becomes available at the needed quality level. Redesign involves additional costs for the business; therefore, it might be tempting to cancel or postpone data projects, at least until they become feasible, though they’re seldom feasible. 

Just because there’s a set of data, this doesn’t mean that there is important knowledge to be extracted from it, respectively that the investment is feasible. There’s however value in building experience in the internal resources, in identifying the challenges and the opportunities, in identifying what needs to be changed for harnessing the data. Unfortunately, organizations expect that somebody else will do the work for them instead of doing the jump by themselves, and this approach more likely will fail. It’s like expecting to get enlightened after a few theoretical sessions with a guru than walking the path by oneself. 

This is reflected also in organizations’ readiness to do the required endeavors for making the jump on the maturity scale. If organizations can’t approach such topics systematically and address the assumptions, opportunities, and risks adequately, respectively to manage the various aspects, it’s hard to believe that their data journey will be positive. 

A data journey shouldn’t be about politics even if some minds need to be changed in the process, at management as well as at lower level. If the leadership doesn’t recognize the importance of becoming an enabler for such initiatives, then the organization probably deserves to keep the status quo. The drive for change should come from the leadership even if we talk about data culture, data strategy, decision-making, or any critical aspect.

An organization will always need to find the balance between time, scope, cost, and quality, and this applies to operations, tactics, and strategies as well as to projects.  There are hard limits and lot of uncertainty associated with data projects and the tasks involved, limits reflected in cost and time estimations (which frankly are just expert’s rough guesses that can change for the worst in the light of new information). Therefore, especially in data projects one needs to be able to compromise, to change scope and timelines as seems fit, and why not, to cancel the projects if the objectives aren’t feasible anymore, respectively if compromises can’t be reached.

An organization must be able to take the risks and invest in failure, otherwise the opportunities for growth don’t change. Being able to split a roadmap into small iterative steps that allow besides breaking down the complexity and making progress to evaluate the progress and the knowledge resulted, respectively incorporate the feedback and knowledge in the next steps, can prove to be what organizations lack in coping with the high uncertainty. Instead, organizations seem to be fascinated by the big bang, thinking that technology can automatically fill the organizational gaps.

Doing the same thing repeatedly and expecting different results is called insanity. Unfortunately, this is what organizations and service providers do in what concerns Project Management in general and data projects in particular. Building something without a foundation, without making sure that the employees have the skillset, maturity and culture to manage the data-related tasks, challenges and opportunities is pure insanity!

Bottom line, harnessing the data requires a certain maturity and it starts with recognizing and pursuing opportunities, setting goals, following roadmaps, learning to fail and getting value from failure, respectively controlling the failure. Growth or instant enlightenment without a fair amount of sweat is possible, though that’s an exception for few in sight!

Previous Post <<||>> Next Post

11 March 2024

🧭🚥Business Intelligence: Key Performance Indicators [KPI] (Between Certainty and Uncertainty)

Business Intelligence
Business Intelligence Series

Despite the huge collection of documented Key Performance Indicators (KPIs) and best practices on which KPIs to choose, choosing a reliable set of KPIs that reflect how the organization performs in achieving its objectives continues to be a challenge for many organizations. Ideally, for each objective there should be only one KPIs that reflects the target and the progress made, though is that realistic?

Let's try to use the driver's metaphor to exemplify several aspects related to the choice of KPIs. A driver's goal is to travel from point A to point B over a distance d in x hours. The goal is SMART (Specific, Measurable, Achievable, Relevant, and Time-bound) if the speed and time are realistic and don't contradict Physics, legal or physical laws. The driver can define the objective as "arriving on time to the destination". 

One can define a set of metrics based on the numbers that can be measured. We have the overall distance and the number of hours planned, from which one can derive an expected average speed v. To track a driver's progress over time there are several metrics that can be thus used: e.g., (1) the current average speed, (2) the number of kilometers to the destination, (3) the number of hours estimated to the destination. However, none of these metrics can be used alone to denote the performance alone. One can compare the expected with the current average speed to get a grasp of the performance, and probably many organizations will use only (1) as KPI, though it's needed to use either (2) or (3) to get the complete picture. So, in theory two KPIs should be enough. Is it so?

When estimating (3) one assumes that there are no impediments and that the average speed can be attained, which might be correct for a road without traffic. There can be several impediments - planned/unplanned breaks, traffic jams, speed limits, accidents or other unexpected events, weather conditions (that depend on the season), etc. Besides the above formula, one needs to quantify such events in one form or another, e.g., through the perspective of the time added to the initial estimation from (3). However, this calculation is based on historical values or navigator's estimation, value which can be higher or lower than the final value. 

Therefore, (3) is an approximation for which is needed also a confidence interval (± t hours). The value can still include a lot of uncertainty that maybe needs to be broken down and quantified separately upon case to identify the deviation from expectations, e.g. on average there are 3 traffic jams (4), if the road crosses states or countries there may be at least 1 control on average (5), etc. These numbers can be included in (3) and the confidence interval, and usually don't need to be reported separately, though probably there are exceptions. 

When planning, one needs to also consider the number of stops for refueling or recharging the car, and the average duration of such stops, which can be included in (3) as well. However, (3) slowly becomes  too complex a formula, and even if there's an estimation, the more facts we're pulling into it, the bigger the confidence interval's variation will be. Sometimes, it's preferable to have instead two-three other metrics with a low confidence interval than one with high variation. Moreover, the longer the distance planned, the higher the uncertainty. One thing is to plan a trip between two neighboring city, and another thing is to plan a trip around the world. 

Another assumption is that the capability of the driver/car to drive is the same over time, which is not always the case. This can be neglected occasionally (e.g. one trip), though it involves a risk (6) that might be useful to quantify, especially when the process is repeatable (e.g. regular commuting). The risk value can increase considering new information, e.g. knowing that every a few thousand kilometers something breaks, or that there's a traffic fine, or an accident. In spite of new information, the objective might also change. Also, the objective might suffer changes, e.g. arrive on-time safe and without fines to the destination. As the objective changes or further objectives are added, more metrics can be defined. It would make sense to measure how many kilometers the driver covered in a lifetime with the car (7), how many accidents (8) or how many fines (9) the driver had. (7) is not related to a driver's performance, but (8) and (9) are. 

As can be seen, simple processes can also become very complex if one attempts to consider all the facts and/or quantify the uncertainty. The driver's metaphor applies to a simple individual, though once the same process is considered across the whole organization (a group of drivers), the more complexity is added and the perspective changes completely. E.g., some drivers might not even reach the destination or not even have a car to start with, and so on. Of course, with this also the objectives change and need to be redefined accordingly. 

The driver's metaphor is good for considering planning activities in which a volume of work needs to be completed in a given time and where a set of constraints apply. Therefore, for some organizations, just using two numbers might be enough for getting a feeling for what's happening. However, as soon one needs to consider other aspects like safety or compliance (considered in aggregation across many drivers), there might be other metrics that qualify as KPIs.

It's tempting to add two numbers and consider for example (8) and (9) together as the two are events that can be cumulated, even if they refer to different things that can overlap (an accident can result in a fine and should be counted maybe only once). One needs to make sure that one doesn't add apples with juice - the quantified values must have the same unit of measure, otherwise they might need to be considered separately. There's the tendency of mixing multiple metrics in a KPI that doesn't say much if the units of measure of its components are not the same. Some conversions can still be made (e.g. how much juice can be obtained from apples), though that's seldom the case.

Previous Post <<||>> Next Post

26 December 2018

🔭Data Science: Uncertainty (Just the Quotes)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"The making of decisions, as everyone knows from personal experience, is a burdensome task. Offsetting the exhilaration that may result from correct and successful decision and the relief that follows the termination of a struggle to determine issues is the depression that comes from failure, or error of decision, and the frustration which ensues from uncertainty." (Chester I Barnard, "The Functions of the Executive", 1938)

"Uncertainty is introduced, however, by the impossibility of making generalizations, most of the time, which happens to all members of a class. Even scientific truth is a matter of probability and the degree of probability stops somewhere short of certainty." (Wayne C Minnick, "The Art of Persuasion", 1957)

"Statistics is a body of methods and theory applied to numerical evidence in making decisions in the face of uncertainty." (Lawrence Lapin, "Statistics for Modern Business Decisions", 1973)

"The most dominant decision type [that will have to be made in an organic organization] will be decisions under uncertainty." (Henry L Tosi & Stephen J Carroll, "Management", 1976)

"The greater the uncertainty, the greater the amount of decision making and information processing. It is hypothesized that organizations have limited capacities to process information and adopt different organizing modes to deal with task uncertainty. Therefore, variations in organizing modes are actually variations in the capacity of organizations to process information and make decisions about events which cannot be anticipated in advance." (John K Galbraith, "Organization Design", 1977)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Probability plays a central role in many fields, from quantum mechanics to information theory, and even older fields use probability now that the presence of 'noise' is officially admitted. The newer aspects of many fields start with the admission of uncertainty." (Richard Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"The mathematical theories generally called 'mathematical theories of chance' actually ignore chance, uncertainty and probability. The models they consider are purely deterministic, and the quantities they study are, in the final analysis, no more than the mathematical frequencies of particular configurations, among all equally possible configurations, the calculation of which is based on combinatorial analysis. In reality, no axiomatic definition of chance is conceivable." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"The worst, i.e., most dangerous, feature of 'accepting the null hypothesis' is the giving up of explicit uncertainty. [...] Mathematics can sometimes be put in such black-and-white terms, but our knowledge or belief about the external world never can." (John Tukey, "The Philosophy of Multiple Comparisons", Statistical Science Vol. 6 (1), 1991)

"In nonlinear systems - and the economy is most certainly nonlinear - chaos theory tells you that the slightest uncertainty in your knowledge of the initial conditions will often grow inexorably. After a while, your predictions are nonsense." (M Mitchell Waldrop, "Complexity: The Emerging Science at the Edge of Order and Chaos", 1992)

"Statistics as a science is to quantify uncertainty, not unknown." (Chamont Wang, "Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety", 1993)

"There is a new science of complexity which says that the link between cause and effect is increasingly difficult to trace; that change (planned or otherwise) unfolds in non-linear ways; that paradoxes and contradictions abound; and that creative solutions arise out of diversity, uncertainty and chaos." (Andy P Hargreaves & Michael Fullan, "What’s Worth Fighting for Out There?", 1998)

"Information entropy has its own special interpretation and is defined as the degree of unexpectedness in a message. The more unexpected words or phrases, the higher the entropy. It may be calculated with the regular binary logarithm on the number of existing alternatives in a given repertoire. A repertoire of 16 alternatives therefore gives a maximum entropy of 4 bits. Maximum entropy presupposes that all probabilities are equal and independent of each other. Minimum entropy exists when only one possibility is expected to be chosen. When uncertainty, variety or entropy decreases it is thus reasonable to speak of a corresponding increase in information." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"Any scientific data without (a stated) uncertainty is of no avail. Therefore the analysis and description of uncertainty are almost as important as those of the data value itself . It should be clear that the uncertainty itself also has an uncertainty – due to its nature as a scientific quantity – and so on. The uncertainty of an uncertainty is generally not determined." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"As uncertainties of scientific data values are nearly as important as the data values themselves, it is usually not acceptable that a best estimate is only accompanied by an estimated uncertainty. Therefore, only the size of nondominant uncertainties should be estimated. For estimating the size of a nondominant uncertainty we need to find its upper limit, i.e., we want to be as sure as possible that the uncertainty does not exceed a certain value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Due to the theory that underlies uncertainties an infinite number of data values would be necessary to determine the true value of any quantity. In reality the number of available data values will be relatively small and thus this requirement can never be fully met; all one can get is the best estimate of the true value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In many cases systematic errors are interpreted as the systematic difference between nature (which is being questioned by the experimenter in his experiment) and the model (which is used to describe nature). If the model used is not good enough, but the measurement result is interpreted using this model, the final result (the interpretation) will be wrong because it is biased, i.e., it has a systematic deviation (not uncertainty). If we do not use the best model (the best theory) available for the description of a certain phenomenon this procedure is just wrong. It has nothing to do with an uncertainty." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is also inevitable for any model or theory to have an uncertainty (a difference between model and reality). Such uncertainties apply both to the numerical parameters of the model and to the inadequacy of the model as well. Because it is much harder to get a grip on these types of uncertainties, they are disregarded, usually." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is important that uncertainty components that are independent of each other are added quadratically. This is also true for correlated uncertainty components, provided they are independent of each other, i.e., as long as there is no correlation between the components." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is the nature of an uncertainty that it is not known and can never be known, whether the best estimate is greater or less than the true value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Outliers or flyers are those data points in a set that do not quite fit within the rest of the data, that agree with the model in use. The uncertainty of such an outlier is seemingly too small. The discrepancy between outliers and the model should be subject to thorough examination and should be given much thought. Isolated data points, i.e., data points that are at some distance from the bulk of the data are not outliers if their values are in agreement with the model in use." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"The fact that the same uncertainty (e.g., scale uncertainty) is uncorrelated if we are dealing with only one measurement, but correlated (i.e., systematic) if we look at more than one measurement using the same instrument shows that both types of uncertainties are of the same nature. Of course, an uncertainty keeps its characteristics (e.g., Poisson distributed), independent of the fact whether it occurs only once or more often." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"To fulfill the requirements of the theory underlying uncertainties, variables with random uncertainties must be independent of each other and identically distributed. In the limiting case of an infinite number of such variables, these are called normally distributed. However, one usually speaks of normally distributed variables even if their number is finite." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In fact, H [entropy] measures the amount of uncertainty that exists in the phenomenon. If there were only one event, its probability would be equal to 1, and H would be equal to 0 - that is, there is no uncertainty about what will happen in a phenomenon with a single event because we always know what is going to occur. The more events that a phenomenon possesses, the more uncertainty there is about the state of the phenomenon. In other words, the more entropy, the more information." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The storytelling mind is allergic to uncertainty, randomness, and coincidence. It is addicted to meaning. If the storytelling mind cannot find meaningful patterns in the world, it will try to impose them. In short, the storytelling mind is a factory that churns out true stories when it can, but will manufacture lies when it can't." (Jonathan Gottschall, "The Storytelling Animal: How Stories Make Us Human", 2012)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting - that is, the more you should prefer simplicity." (Brian Christian & Thomas L Griffiths, "Algorithms to Live By: The Computer Science of Human Decisions", 2016)

"A notable difference between many fields and data science is that in data science, if a customer has a wish, even an experienced data scientist may not know whether it’s possible. Whereas a software engineer usually knows what tasks software tools are capable of performing, and a biologist knows more or less what the laboratory can do, a data scientist who has not yet seen or worked with the relevant data is faced with a large amount of uncertainty, principally about what specific data is available and about how much evidence it can provide to answer any given question. Uncertainty is, again, a major factor in the data scientific process and should be kept at the forefront of your mind when talking with customers about their wishes."  (Brian Godsey, "Think Like a Data Scientist", 2017)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Uncertainty is an adversary of coldly logical algorithms, and being aware of how those algorithms might break down in unusual circumstances expedites the process of fixing problems when they occur - and they will occur. A data scientist’s main responsibility is to try to imagine all of the possibilities, address the ones that matter, and reevaluate them all as successes and failures happen." (Brian Godsey, "Think Like a Data Scientist", 2017)

"Bootstrapping provides an intuitive, computer-intensive way of assessing the uncertainty in our estimates, without making strong assumptions and without using probability theory. But the technique is not feasible when it comes to, say, working out the margins of error on unemployment surveys of 100,000 people. Although bootstrapping is a simple, brilliant and extraordinarily effective idea, it is just too clumsy to bootstrap such large quantities of data, especially when a convenient theory exists that can generate formulae for the width of uncertainty intervals." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Entropy is a measure of amount of uncertainty or disorder present in the system within the possible probability distribution. The entropy and amount of unpredictability are directly proportional to each other." (G Suseela & Y Asnath V Phamila, "Security Framework for Smart Visual Sensor Networks", 2019)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

15 December 2018

🔭Data Science: Probability (Just the Quotes)

"Probability is a degree of possibility." (Gottfried W Leibniz, "On estimating the uncertain", 1676)

"Probability, however, is not something absolute, [it is] drawn from certain information which, although it does not suffice to resolve the problem, nevertheless ensures that we judge correctly which of the two opposites is the easiest given the conditions known to us." (Gottfried W Leibniz, "Forethoughts for an encyclopaedia or universal science", cca. 1679)

"[…] the highest probability amounts not to certainty, without which there can be no true knowledge." (John Locke, "An Essay Concerning Human Understanding", 1689)

"As mathematical and absolute certainty is seldom to be attained in human affairs, reason and public utility require that judges and all mankind in forming their opinions of the truth of facts should be regulated by the superior number of the probabilities on the one side or the other whether the amount of these probabilities be expressed in words and arguments or by figures and numbers." (William Murray, 1773)

"All certainty which does not consist in mathematical demonstration is nothing more than the highest probability; there is no other historical certainty." (Voltaire, "A Philosophical Dictionary", 1881)

"Nature prefers the more probable states to the less probable because in nature processes take place in the direction of greater probability. Heat goes from a body at higher temperature to a body at lower temperature because the state of equal temperature distribution is more probable than a state of unequal temperature distribution." (Max Planck, "The Atomic Theory of Matter", 1909)

"Sometimes the probability in favor of a generalization is enormous, but the infinite probability of certainty is never reached." (William Dampier-Whetham, "Science and the Human Mind", 1912)

"There can be no unique probability attached to any event or behaviour: we can only speak of ‘probability in the light of certain given information’, and the probability alters according to the extent of the information." (Sir Arthur S Eddington, "The Nature of the Physical World", 1928)

"[…] the statistical prediction of the future from the past cannot be generally valid, because whatever is future to any given past, is in tum past for some future. That is, whoever continually revises his judgment of the probability of a statistical generalization by its successively observed verifications and failures, cannot fail to make more successful predictions than if he should disregard the past in his anticipation of the future. This might be called the ‘Principle of statistical accumulation’." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1929)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Equiprobability in the physical world is purely a hypothesis. We may exercise the greatest care and the most accurate of scientific instruments to determine whether or not a penny is symmetrical. Even if we are satisfied that it is, and that our evidence on that point is conclusive, our knowledge, or rather our ignorance, about the vast number of other causes which affect the fall of the penny is so abysmal that the fact of the penny’s symmetry is a mere detail. Thus, the statement 'head and tail are equiprobable' is at best an assumption." (Edward Kasner & James R Newman, "Mathematics and the Imagination", 1940)

"Probabilities must be regarded as analogous to the measurement of physical magnitudes; that is to say, they can never be known exactly, but only within certain approximation." (Emile Borel, "Probabilities and Life", 1943)

"Just as entropy is a measure of disorganization, the information carried by a set of messages is a measure of organization. In fact, it is possible to interpret the information carried by a message as essentially the negative of its entropy, and the negative logarithm of its probability. That is, the more probable the message, the less information it gives. Clichés, for example, are less illuminating than great poems." (Norbert Wiener, "The Human Use of Human Beings", 1950)

"To say that observations of the past are certain, whereas predictions are merely probable, is not the ultimate answer to the question of induction; it is only a sort of intermediate answer, which is incomplete unless a theory of probability is developed that explains what we should mean by ‘probable’ and on what ground we can assert probabilities." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"Uncertainty is introduced, however, by the impossibility of making generalizations, most of the time, which happens to all members of a class. Even scientific truth is a matter of probability and the degree of probability stops somewhere short of certainty." (Wayne C Minnick, "The Art of Persuasion", 1957)

"Everybody has some idea of the meaning of the term 'probability' but there is no agreement among scientists on a precise definition of the term for the purpose of scientific methodology. It is sufficient for our purpose, however, if the concept is interpreted in terms of relative frequency, or more simply, how many times a particular event is likely to occur in a large population." (Alfred R Ilersic, "Statistics", 1959)

"Incomplete knowledge must be considered as perfectly normal in probability theory; we might even say that, if we knew all the circumstances of a phenomenon, there would be no place for probability, and we would know the outcome with certainty." (Félix E Borel, Probability and Certainty", 1963)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985) 

"Probability plays a central role in many fields, from quantum mechanics to information theory, and even older fields use probability now that the presence of 'noise' is officially admitted. The newer aspects of many fields start with the admission of uncertainty." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Probabilities are summaries of knowledge that is left behind when information is transferred to a higher level of abstraction." (Judea Pearl, "Probabilistic Reasoning in Intelligent Systems: Network of Plausible, Inference", 1988)

"[In statistics] you have the fact that the concepts are not very clean. The idea of probability, of randomness, is not a clean mathematical idea. You cannot produce random numbers mathematically. They can only be produced by things like tossing dice or spinning a roulette wheel. With a formula, any formula, the number you get would be predictable and therefore not random. So as a statistician you have to rely on some conception of a world where things happen in some way at random, a conception which mathematicians don’t have." (Lucien LeCam, [interview] 1988)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996) 

"Often, we use the word random loosely to describe something that is disordered, irregular, patternless, or unpredictable. We link it with chance, probability, luck, and coincidence. However, when we examine what we mean by random in various contexts, ambiguities and uncertainties inevitably arise. Tackling the subtleties of randomness allows us to go to the root of what we can understand of the universe we inhabit and helps us to define the limits of what we can know with certainty." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"In the laws of probability theory, likelihood distributions are fixed properties of a hypothesis. In the art of rationality, to explain is to anticipate. To anticipate is to explain." (Eliezer S. Yudkowsky, "A Technical Explanation of Technical Explanation", 2005)

"For some scientific data the true value cannot be given by a constant or some straightforward mathematical function but by a probability distribution or an expectation value. Such data are called probabilistic. Even so, their true value does not change with time or place, making them distinctly different from  most statistical data of everyday life." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In fact, H [entropy] measures the amount of uncertainty that exists in the phenomenon. If there were only one event, its probability would be equal to 1, and H would be equal to 0 - that is, there is no uncertainty about what will happen in a phenomenon with a single event because we always know what is going to occur. The more events that a phenomenon possesses, the more uncertainty there is about the state of the phenomenon. In other words, the more entropy, the more information." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models.  [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Entropy is a measure of amount of uncertainty or disorder present in the system within the possible probability distribution. The entropy and amount of unpredictability are directly proportional to each other." (G Suseela & Y Asnath V Phamila, "Security Framework for Smart Visual Sensor Networks", 2019)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.