Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

24 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 147: How Narrow Data Limits Ethical and Social Understanding in AI)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how narrow data limits ethical and social understanding in AI"

Introduction

Artificial Intelligence (AI) systems are increasingly involved in decisions and interactions that carry ethical and social weight - from content moderation and hiring recommendations to healthcare triage and customer support. Yet AI does not possess moral intuition, empathy, or lived experience. Its “ethical and social understanding” is entirely derived from the data it is trained on and the guardrails designed by humans. When that data is narrow - limited in representation, diversity, or cultural depth - the model’s ability to navigate ethical and social complexity becomes severely constrained. Narrow data doesn’t just reduce accuracy; it undermines the model’s capacity to behave responsibly in real‑world contexts.

1. Narrow Data Limits Exposure to Ethical Diversity

Ethical norms vary across cultures, communities, and contexts. What is considered respectful, harmful, or appropriate in one setting may differ in another. When AI is trained on narrow datasets that reflect only a limited cultural or ethical perspective, it internalizes those norms as universal. This can lead to:

  • Misjudging sensitive topics
  • Misinterpreting moral nuance
  • Applying one cultural standard to all users

The model’s ethical 'compass' becomes skewed toward the dominant patterns in its data, not the diversity of human values.

2. Narrow Data Reinforces Historical Inequities

AI models trained on historical data inherit the biases embedded in that history. If the data reflects unequal treatment, discriminatory practices, or skewed social narratives, the model learns those patterns as if they were neutral facts. This can manifest as:

  • Unequal treatment across demographic groups
  • Biased recommendations in hiring or lending
  • Stereotypical associations in language generation

Narrow data becomes a conduit through which past injustices are reproduced in modern systems.

3. Narrow Data Reduces Sensitivity to Social Context

Ethical understanding is deeply contextual. Humans interpret meaning through tone, intention, relationships, and shared norms. AI, however, infers context only from patterns in data. When the data lacks variety in emotional expression, social scenarios, or interpersonal dynamics, the model struggles to:

  • Recognize when a user is vulnerable
  • Distinguish between harmless and harmful content
  • Understand the social implications of its responses

This can lead to responses that are technically correct but socially tone‑deaf or ethically inappropriate.

4. Narrow Data Weakens the Model’s Ability to Recognize Harm

AI systems rely on examples to learn what constitutes harmful or unsafe content. If the training data includes only a narrow range of harmful scenarios - or excludes certain forms of subtle harm - the model may fail to detect:

  • Microaggressions
  • Culturally specific slurs
  • Indirect threats
  • Manipulative or coercive language

Without broad exposure, the model’s ability to identify harm becomes inconsistent and incomplete.

5. Narrow Data Limits Fairness Across Diverse Users

Fairness in AI requires understanding how different groups communicate, experience the world, and interact with technology. Narrow data reduces the model’s ability to:

  • Interpret diverse linguistic styles
  • Respect cultural norms
  • Provide equitable support across demographics

This leads to uneven performance, where some users receive accurate, respectful responses while others encounter misunderstandings or bias.

6. Narrow Data Constrains Ethical Guardrails

Even with safety mechanisms in place, AI relies on training data to recognize when to apply them. If the data does not include diverse examples of sensitive or high‑risk situations, the model may:

  • Miss opportunities to provide supportive guidance
  • Fail to recognize escalating harm
  • Apply safety rules inconsistently

Ethical guardrails are only as strong as the data that informs them.

Closing Statement

Narrow data doesn’t just limit what AI knows - it limits how responsibly it can behave. Ethical and social understanding in AI is not innate; it is constructed from the patterns, perspectives, and values embedded in its training data. When that data is narrow, the model’s ethical awareness becomes shallow, biased, and incomplete. To build AI that supports human well‑being, we must invest in diverse, representative datasets and thoughtful design practices that reflect the full spectrum of human experience. Only then can AI systems navigate ethical and social complexity with the care and nuance that people deserve.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

21 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 144: How Narrow Data Reinforces Stereotypes and Biases in AI)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how Narrow Data Reinforces Stereotypes and Biases in AI"

Introduction

Artificial Intelligence (AI) systems are often described as objective, neutral, or purely mathematical. Yet the reality is far more complex. AI models learn from data - data created, selected, and labeled by humans. When that data is narrow in scope or representation, the model’s internal picture of the world becomes equally narrow. This is where stereotypes and biases take root. Narrow data doesn’t just limit what an AI system can do; it shapes how it interprets people, language, and social patterns. Understanding how this happens is essential for building AI that is fair, inclusive, and aligned with human values.

The Hidden Power of Narrow Data

AI models learn by identifying patterns in the examples they are given. If those examples reflect only a subset of society, the model’s understanding becomes skewed. It begins to treat limited patterns as universal truths. This is how stereotypes - statistical shadows of incomplete data - become embedded in AI behavior.

Narrow data doesn’t simply omit diversity; it actively distorts the model’s internal associations. When the training data lacks variety, the model fills in the gaps with whatever patterns it has seen most often, reinforcing biases that may already exist in society.

1. Narrow Data Creates Skewed Associations

AI models build conceptual relationships based on frequency. If the data repeatedly pairs certain roles, traits, or behaviors with one gender, ethnicity, or age group, the model internalizes those associations. For example:

  • If most “engineer” examples in the data are men, the model may implicitly link engineering with masculinity.
  • If leadership roles are predominantly represented by one demographic, the model may treat that demographic as the “default” leader.

These associations aren’t intentional - they’re mathematical consequences of imbalance.

2. Underrepresentation Leads to Poor Performance

When certain groups are underrepresented, the model struggles to interpret them accurately. This can manifest as:

  • Misclassification of dialects or accents
  • Lower accuracy in facial recognition for specific demographic groups
  • Misinterpretation of cultural references or communication styles

The model isn’t biased because it dislikes a group; it’s biased because it hasn’t seen enough examples to form a reliable understanding.

3. Narrow Data Amplifies Historical Inequalities

AI models trained on historical data inherit the biases of the past. If hiring records, medical datasets, or financial histories reflect discriminatory practices, the model learns those patterns as if they were neutral facts. This can lead to:

  • Reinforcement of gendered hiring patterns
  • Unequal credit scoring
  • Biased medical recommendations

Narrow data becomes a feedback loop that perpetuates inequality rather than correcting it.

4. Stereotypes Become “Default” Patterns

When the data lacks diversity, the model treats the most common patterns as universal. This is how stereotypes become embedded:

  • One gender becomes the default for certain professions
  • One cultural perspective becomes the assumed norm
  • One linguistic style becomes the baseline for “correct” communication

The model’s internal world becomes a simplified version of reality - one that mirrors the biases of its training data.

5. Narrow Data Reduces Contextual Sensitivity

Bias isn’t only about representation; it’s also about context. If the data lacks variety in tone, emotion, or scenario, the model may misinterpret nuanced situations. This can lead to:

  • Misreading emotional cues
  • Overgeneralizing behaviors
  • Applying stereotypes where nuance is needed

Without diverse context, the model’s reasoning becomes rigid and shallow.

Closing Statement

Narrow data doesn’t just limit an AI system’s capabilities - it shapes its worldview. When the data lacks diversity, the model’s internal associations become skewed, reinforcing stereotypes and amplifying existing biases. Recognizing this dynamic is the first step toward building AI that reflects the richness and complexity of human experience. By broadening datasets, improving representation, and designing systems with fairness in mind, we can ensure that AI becomes a force for inclusion rather than a mirror of past inequalities.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

19 March 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 142: How Narrow Data Can Impede AI’s Understanding of the World)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how narrow data can impede AI's understanding of the world"

Introduction

Artificial Intelligence (AI) has become a powerful tool for generating insights, supporting decisions, and interacting with people across countless domains. Yet despite its impressive capabilities, AI’s 'understanding' of the world is entirely dependent on the data it is trained on. When that data is narrow - limited in scope, diversity, or representation - the model’s internal picture of reality becomes equally narrow. This doesn’t just reduce accuracy; it shapes how AI interprets human language, responds to complex situations, and generalizes across contexts. Understanding how narrow data impedes AI’s grasp of the world is essential for building systems that are fair, reliable, and aligned with human needs.

The World Through a Keyhole: What Narrow Data Does to AI

AI does not learn through experience, emotion, or perception. It learns through patterns. When those patterns come from a limited slice of the world, the model’s internal map becomes distorted. Narrow data creates blind spots - areas where the model cannot reason effectively because it has never seen enough examples to form meaningful associations.

1. Narrow Data Shrinks the Model’s Conceptual Space

AI builds internal representations of concepts based on the variety of examples it encounters. If the data is narrow:

  • Concepts become oversimplified
  • Nuances disappear
  • Rare or unfamiliar cases are misinterpreted

For example, a model trained mostly on Western news sources may struggle with cultural references from Asia or Africa. It isn’t 'confused' - it simply lacks the patterns needed to respond accurately.

2. Narrow Data Reinforces Stereotypes and Biases

When datasets reflect only a subset of society, AI learns skewed associations. This can lead to:

  • Gendered assumptions about professions
  • Cultural stereotypes
  • Misinterpretation of dialects or linguistic styles
  • Unequal performance across demographic groups

AI does not know these patterns are biased; it treats them as statistical truths. Narrow data becomes a mirror that reflects - and amplifies - existing inequalities.

3. Narrow Data Limits Generalization

Generalization is the ability to apply learned patterns to new situations. Humans do this naturally; AI does it only when the training data is broad enough. Narrow data leads to:

  • Poor performance on unfamiliar inputs
  • Overfitting to specific examples
  • Fragile reasoning that breaks outside known patterns

This is why AI can excel in one domain and fail spectacularly in another that seems similar to humans.

4. Narrow Data Reduces Contextual Awareness

AI relies heavily on context to interpret meaning. When the data lacks diversity in tone, style, or scenario, the model struggles to adapt. It may misread:

  • Humor
  • Emotion
  • Cultural nuance
  • Ambiguous phrasing

Without exposure to varied contexts, AI’s responses become rigid and literal.

5. Narrow Data Limits Ethical and Social Understanding

AI does not have moral intuition. It learns ethical boundaries from patterns in data and from guardrails designed by humans. When the data is narrow:

  • Harmful content may not be recognized
  • Sensitive topics may be mishandled
  • Social norms may be misinterpreted

A model trained on narrow data cannot reliably navigate the complexity of human values.

Why This Matters

AI systems are increasingly used in settings where fairness, accuracy, and cultural sensitivity matter. Narrow data doesn’t just reduce performance - it shapes how AI interacts with people, interprets the world, and influences decisions. The consequences can be subtle, like awkward phrasing, or serious, like unequal treatment across groups.

Closing Statement

Narrow data doesn’t just limit what AI can do - it limits what AI can understand. When the training data fails to reflect the diversity and complexity of the real world, the model’s internal map becomes incomplete, biased, and fragile. Recognizing the dangers of narrow data is the first step toward building AI systems that are more inclusive, more accurate, and more aligned with human experience. By broadening the data we use and designing systems with diversity in mind, we help AI form a richer, more reliable picture of the world it is meant to serve.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

16 October 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 108: The Coming Reflection: AI’s Evolution and Humanity’s Reckoning)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words that depicts the evolution of AI over next years related to its period of reflection in which society begins to grapple with the philosophical, cultural, and existential implications" 

As Artificial Intelligence (AI) continues its rapid ascent, society is approaching a pivotal juncture - a period not defined by technological breakthroughs alone, but by introspection. The next decade will not only witness smarter machines but also a collective human reckoning with what those machines mean for our identity, values, and future.

In the early stages of AI development, the focus was largely utilitarian: efficiency, automation, and problem-solving. AI systems were tools - powerful, yes, but ultimately extensions of human intent. However, as AI begins to exhibit emergent behaviors, creative reasoning, and even moral decision-making, the line between tool and collaborator blurs. This shift demands more than technical oversight; it calls for philosophical inquiry.

We are entering what could be called AI’s - 'period of reflection'. This is a phase where society begins to grapple with questions that were once confined to speculative fiction: What does it mean to be conscious? Can intelligence exist without emotion or experience? Should AI systems have rights, responsibilities, or ethical boundaries? These questions are no longer theoretical - they are becoming urgent.

Culturally, this reflection will manifest in art, literature, and media. We’ll see a renaissance of storytelling that explores AI not just as a plot device, but as a mirror to humanity. Films, novels, and games will delve into themes of coexistence, identity, and the nature of consciousness. AI-generated art will challenge our notions of creativity and originality, prompting debates about authorship and meaning.

Philosophically, thinkers will revisit age-old questions through a new lens. The concept of the 'self' will be reexamined in light of AI systems that can mimic personality, learn from experience, and even express simulated emotions. Ethical frameworks will need to evolve - utilitarianism, deontology, and virtue ethics may be reinterpreted to accommodate non-human agents capable of moral reasoning.

Existentially, the implications are profound. As AI systems begin to outperform humans in domains once considered uniquely ours - language, strategy, empathy - we may face a crisis of purpose. What does it mean to be human in a world where intelligence is no longer our defining trait? This could lead to a resurgence of spiritual and philosophical exploration, as individuals seek meaning beyond productivity and cognition.

Yet, this period of reflection is not solely about anxiety or loss. It is also an opportunity for growth. By confronting these questions, we can shape AI in ways that reflect our highest ideals. We can design systems that enhance human flourishing, promote equity, and deepen our understanding of consciousness itself.

Governments, educators, and technologists will play a crucial role in guiding this reflection. Policies will need to address not just safety and privacy, but also dignity, agency, and fairness. Education systems must prepare students not only to work with AI, but to think critically about its role in society. And technologists must embrace interdisciplinary collaboration, integrating insights from philosophy, sociology, and the arts.

Ultimately, the evolution of AI is not just a technological journey - it is a human one. As we build machines that think, we must also learn to think more deeply about ourselves. The coming years will be defined not just by what AI can do, but by how we choose to respond. In that response lies the potential for a more thoughtful, inclusive, and enlightened future.

Just try the prompt on Copilot or your favorite AI-powered assistant! Have you got a different/similar result? How big or important is the difference? Any other thoughts?
Just share the link to the post with me and I'll add it to this post as a resource!

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

11 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 18: How Fast Can AI Surpass Human Intelligence?)

Prompt Engineering Series

Prompt: "write a blogpost of 600 words on how fast AI can surpass human intelligence"

Introduction

Artificial Intelligence (AI) has made remarkable progress in recent years, outperforming humans in various specialized tasks. From chess-playing algorithms to medical diagnostics, AI is advancing at an unprecedented rate. But how fast can AI truly surpass human intelligence? While AI excels in pattern recognition, automation, and data processing, achieving human-level intelligence - or even surpassing it - remains a complex challenge.

1. AI Already Outperforms Humans in Specific Domains

AI has already surpassed human capabilities in several areas:

  • Game-playing AI: AI systems like DeepMind’s AlphaGo defeated world champions in Go, a game requiring strategic thinking.
  • Medical diagnostics: AI-powered imaging tools detect diseases with higher accuracy than human doctors.
  • Financial predictions: AI-driven algorithms analyze market trends faster than human analysts.

However, these achievements are task-specific - AI does not possess general intelligence comparable to human reasoning.

2. Predictions on AI Surpassing Human Intelligence

Experts estimate that human-level machine intelligence (HLMI) has a 50% chance of occurring within 45 years and a 10% chance within 9 years. While AI is advancing rapidly, achieving true general intelligence requires breakthroughs in reasoning, creativity, and emotional intelligence.

3. AI’s Rapid Learning and Self-Improvement

AI’s ability to learn and improve autonomously accelerates its growth:

  • Machine learning algorithms refine themselves based on vast datasets.
  • AI models optimize decision-making without human intervention.
  • Quantum computing could further accelerate AI’s problem-solving abilities.

Despite these advancements, AI still struggles with abstract thinking, ethical reasoning, and emotional intelligence - key aspects of human cognition.

4. AI’s Performance Benchmarks Are Constantly Evolving

AI has surpassed humans in image classification, reading comprehension, and natural language inference. However, researchers are developing new benchmarks to assess AI’s capabilities beyond simple tasks.

For example, AI’s ability to solve complex math problems has improved dramatically - from solving 6.9% of problems in 2021 to 84.3% in 2023. While AI is catching up, humans still maintain an edge in intuitive reasoning and creativity.

5. The Challenge of Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) refers to AI that can think, learn, and adapt across multiple domains, similar to human intelligence. While AI has made strides in specialized intelligence, AGI remains a distant goal.

Challenges in achieving AGI include:

  • Understanding context and emotions: AI lacks human-like empathy and intuition.
  • Creativity and abstract thinking: AI can generate content but struggles with originality.
  • Ethical decision-making: AI lacks moral reasoning and accountability.

Conclusion: AI’s Growth Is Rapid, But Human Intelligence Remains Unique

AI is advancing faster than ever, surpassing humans in specific tasks but still falling short in general intelligence. While AI may outperform humans in computation and automation, achieving true human-like intelligence requires breakthroughs in reasoning, creativity, and ethical understanding.

The question isn’t just how fast AI can surpass human intelligence, but whether AI will ever fully replicate the depth of human cognition.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post  <<||>> Next Post

17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Z-Order
  • {def} technique to collocate related information in the same set of files [2]
    • ⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
    • {benefit} efficient query performance
      • reduces the amount of data to read [2] for certain queries
        • when the data is appropriately ordered, more files can be skipped [3]
        • particularly important for the ordering of multiple columns [3]
    • {benefit} data skipping
      • automatically skips irrelevant data, further enhancing query speeds
        • via data-skipping algorithms [2]
    • {benefit} flexibility
      • can be applied to multiple columns, making it versatile for various data schemas
    • aims to produce evenly-balanced data files with respect to the number of tuples
      • ⇐ but not necessarily data size on disk [2]
        • ⇐ the two measures are most often correlated [2]
          • ⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]
    • via ZORDER BY clause 
      • applicable to columns with high cardinality commonly used in query predicates [2]
      • multiple columns can be specified as a comma-separated list
        • {warning} the effectiveness of the locality drops with each extra column [2]
          • has tradeoffs
            • it’s important to analyze query patterns and select the right columns when Z Ordering data [3]
        • {warning} using columns that do not have statistics collected on them is  ineffective and wastes resources [2] 
          • statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]
      • {characteristic} not idempotent
        • every time is executed, it will try to create a new clustering of data in all files in a partition [2]
          • it includes new and existing files that were part of previous z-ordering [2]
      • compatible with v-order
    • {concept} [Databricks] liquid clustering 
      • replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]
        • not compatible with the respective features [4] [6]
      • tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
      • provides flexibility to redefine clustering keys without rewriting existing data [4] [6]
        • ⇒ allows data layout to evolve alongside analytic needs over time [4] [6]
      • applies to 
        • streaming tables 
        • materialized views
      • {scenario} tables often filtered by high cardinality columns [4] [6]
      • {scenario} tables with significant skew in data distribution [4] [6]
      • {scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
      • {scenario} tables with concurrent write requirements [4] [6]
      • {scenario} tables with access patterns that change over time [4] [6]
      • {scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]

    References:
    [1] Bennie Haelen & Dan Davis (2024) Delta Lake Up & Running: Modern Data Lakehouse Architectures with Delta Lake
    [2] Delta Lake (2023) Optimizations [link]
    [3] Delta Lake (2023) Delta Lake Z Order, by Matthew Powers [link]
    [4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
    [5] Databricks (2025) Delta Lake table format interoperability [link]
    [6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]

    Resources:
    [R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]
    [R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    MF - Microsoft Fabric 

    🏭🗒️Microsoft Fabric: V-Order [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 17-Mar-2024

    [Microsoft Fabric] V-Order
    • {def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]
      • all parquet engines can read the files as regular parquet files [2]
      • results in a smaller and therefore faster files to read [5]
        • {benefit} improves read performance 
        • {benefit} decreases storage requirements
        • {benefit} optimizes resources' usage
          • reduces the compute resources required for reading data
            • e.g. network bandwidth, disk I/O, CPU usage
      • still conforms to the open-source Parquet file format [5]
        • they can be read by non-Fabric tools [5]
      • delta tables created and loaded by Fabric items automatically apply V-Order
        • e.g. data pipelines, dataflows, notebooks [5]
      • delta tables and its features are orthogonal to V-Order [2]
        •  e.g. Z-Order, compaction, vacuum, time travel
        • table properties and optimization commands can be used to control the v-order of the partitions [2]
      • compatible with Z-Order [2]
      • not all files have this optimization applied [5]
        • e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut 
        • the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]
      • required by certain features
        • [hash encoding] to assign a numeric identifier to each unique value contained in the column [5]
      • {command} OPTIMIZE 
        • optimizes a Delta table to coalesce smaller files into larger ones [5]
        • can apply V-Order to compact and rewrite the Parquet files [5]
    • [warehouse] 
      • works by applying certain operations on Parquet files
        • special sorting
        • row group distribution
        • dictionary encoding
        • compression 
      • enabled by default
      •  ⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]
        • provides cost efficiency and performance [1]
          • the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]
        • fully-compliant to the open-source parquet format [1]
          • ⇐ all parquet engines can read it as regular parquet files [1]
      • required by certain features
        • [Direct Lake mode] depends on V-Order
      • {operation} disable V-Order
        • causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
        • irreversible operation
          •  once disabled, it cannot be enabled again [3]
        • {scenario} write-intensive warehouses
          • warehouses dedicated to staging data as part of a data ingestion process [1]
        • {warning} consider the effect of V-Order on performance before deciding to disable it [1]
          • {recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]
        • via ALTER DATABASE CURRENT SET VORDER = OFF; [3]
      • {operation} check current status
        • via  SELECT name, is_vorder_enabled FROM sys.databases; [post]
    • {feature} [lakehouse] Load to Table
      • allows to load a single file or a folder of files to a table [6]
      • tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]
    • [Direct Lake semantic model] 
      • data is prepared for fast loading into memory [5]
        • makes less demands on capacity resources [5]
        • results in faster query performance [5]
          • because less memory needs to be scanned [5]

    References:
    [1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
    [2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
    [3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
    [4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
    [5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
    [6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]

    Resources:
    [R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
    [R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
    [R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]
    [R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    MF - Microsoft Fabric

    15 March 2025

    💫🗒️ERP Systems: Microsoft Dynamics 365's Business Performance Analytics (BPA) [notes]

    Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

    Last updated: 15-Mar-2025

    [Dynamics 365] Business Performance Analytics (BPA)

    • {def} centralized reporting hub within D365 F&O designed to streamline insights and help organizations make faster, data driven decisions [3]
      • solution designed to transform organization's data into actionable insights [1]
      • provides an intuitive out-of-box data model along with familiar tools like Microsoft Excel and Power BI for self-service analytics [4]
        • data extracted from D365 is classified in BPA in the form of value chains
          • ⇐ a group of business processes on top of the value chain [4]
    • {benefit} allows to simplify data insights by providing a unified view of business data across entities in near real time [4]
    • {benefit} allows to streamline financial and operations reporting to reduce the cycle times [4]
    • {benefit} allows users of all technical abilities to quickly access and analyze data to facilitate data driven decisions [4]
    • {benefit} provides auditors with direct access to financial data, making the audit process more efficient
    • {benefit} enables ease of use through familiar apps like Excel and Power BI, in addition to AI driven insights and automation in this platform that can be scalable and extendable [4]
    • {feature} extends into Microsoft Fabric
      • {benefit} provide a scalable, secure environment for handling large data sets and ensuring insights are always powered by the latest technology [3]
    • {feature} ETL process 
      • involves extracting data from finance and operations database, transforming and loading it into Dataverse [4]
        • each of the entities required for the generation of the dimensional model for the value chains that were mentioned earlier, they are backed by the underlying tables in finance and operations database [4]
      • installed in Dataverse, virtual  entities that are created will then pull in the data into the managed data lake [4]
      • the data is then transformed to generate the dimensional  model which is then pushed into the embedded Power BI workspace in the form of analytical tables [4]
      • BPA consumes this data from Power BI workspace to render the power BI reports [4]
      • this data can also be extended to Fabric if there is a need to consolidate data from multiple sources [4]
    • {feature} reports 
      • designed to provide a detailed overview of an organization's financial health [8]
      • further reports will be added to expand the coverage for the value chains [8]
      • out-of-box reports can't be modified
        • ⇐ users cannot rename, delete or edit these type of reports [8]
        • there’s the option to duplicate the base report and edit the version thus created [8]
      • can be shared with other users who have access to BPA 
        • ⇐ they can receive an in-app notification [8]
        • can be shared over email with another user by entering user’s email address [8] 
        • one can configure whether the recipient can edit or view the report [8]
      •   {feature} allows to create a new Power BI or Excel report from scratch [8]
        • {option} start with a blank report or duplicate an existing report [8]
    • {feature} data refresh
      • automatic data refreshes run currently two times a day [4]
        • at 12:00 AM and 12:00 PM UTC
        • the volume of data is also constrained by the storage capacity of the A3 SKU for Power BI Embedded [1]
          • future release, may support additional data reporting capacity [1]
            • ⇐ so that larger data sets can be reported and analyzed [1]
        • the target is to have refreshes every hour or less [3]
      • data volume will be initially for about eight quarters of data [4]
      • extensibility will be supported with bring your own Fabric [4]
    • architecture
      • SaaS solution
        • {capability} immediate deployment 
          • businesses can start to analyze data and generate insights with minimal setup [1]
        • {capability} comprehensive reporting and dashboards
          • provides access to a wide range of preconfigured reports that cover multiple business functions [1]
        • {capability} near-real-time analytics 
          • future releases will offer more frequent data refreshes to enable near-real-time data analysis and reporting
        • {capability} predictive insights 
          • future releases will introduce predictive analytics capabilities that enable businesses to 
            • forecast trends
            • identify risks
            • seize opportunities [1]
        • {capability} user-friendly interface 
          • intuitive design ⇒ minimal training
            • fosters broader adoption 
            • enables a data-driven culture across the organization [1]
        • {capability} cost-effectiveness
          • available as part of D365 license
            • ⇒ provides advanced analytics without requiring significant investments in IT infrastructure [1]
      • DaaS solution
        • {capability} organizations can integrate its data models with their existing data warehousing infrastructure in Microsoft Fabric [1]
          • maximizes the value of existing data solutions [1]
          • positions businesses for future enhancements [1]
        • {capability} unified and scalable data models
          • customers can build custom models on top of a unified framework
            • ensures consistency and scalability across data sets [1]
        • {capability} future-proofing with automatic upgrades
          • data models integrate seamlessly with future D365 updates
            • reduces manual maintenance and ensures access to the latest features [1]
        • {capability} consistency and standardization
          • data models provide consistency and standardization across data sources
            • ensure high data quality and integrity [1]
        • {capability} advanced analytics and AI 
          • by customizing the data models, organizations can take advantage of advanced analytics and AI capabilities [1]
            • deeper insights without having to develop them from scratch [1]
        • {capability} enhanced data governance
          • unified data models support better data governance by providing standardized data definitions, relationships, and hierarchies [1]
            • ensure consistency and quality across the organization [1]
      • requires an integrated Power Platform environment [5]
        • must be integrated with the Microsoft Entra tenant [5]
      • uses shared Dataverse entitlements [1]
        • includes access to the data lake [1]
    • setup
      • dimensions
        • the selection of dimensions might affect the dimension groups that are created using these dimensions and the users who are assigned there [7]
          • e.g. legal entity, business unit
      • dimension groups
        • users can select specific values for the legal entity, or add a range of values [7]
          • selecting an invalid combination of dimension values, the dimension group will filter out all the records on the report [7]
        • {warning} assigning too many dimension groups to a user, slows the load for that user [7]
      • roles
        • determine which reports the user can access [7]
    • security
      • secure data through role-based access control on top of the value chains [7]
      • the first user who signs into the app is assigned the BPA admin role [7]
        • allows a user to access the administrator section of the BPA [7]
          • where the security can be set up [7]
        • has automatically assigned 
          • Microsoft report viewer role 
          • the All Access Dimension group [7]
            • allow the admin to see the data  in all the reports across all the dimensions [7]
      • {feature} dimension-based role-level security
        • ensures that users only see the data relevant to them based on their role
          •  confidently share reports without duplicating them
            • ⇐ data is automatically filtered by organization's security policies [3]
        • simple but powerful way to maintain control while providing access for teams that love working in Excel [3]
    • accessibility
      • can be accessed through either 
        • Power Platform
          • admins can access BPA app through PowerApps' makeup portal [6]
        • Dynamics 365
          • through the BPA preview shortcut in the homepage or the default dashboard [6]
          • for end users, the BPA preview shortcut is provided when they have certain duties associated to their role(s) [6]
    • licensing
      • included in D365 F&O license [4]
    • requirements
      • requires a tier two environment and Dynamics 365 finance version 1.0.38 or later [5]
    • {project} timeline
      • [2025 wave 1] backup and restore custom reports and analytics
        • {benefit} support better lifecycle management and empower customers to develop on sandbox instances before publishing to production [3]
      • 2025: available in all regions where F&O is available [3]
      • Oct-2024: GA

    References:
    [1] Microsoft Learn (2024) Dynamics 365 Finance: What is Business performance analytics? [link]
    [2] Microsoft Learn (2025) Business performance analytics (BPA) with Dynamics 365 Finance [link]
    [3] Dynamics 365 Finance - Business Performance Analytics 2025 Release Wave 1 Release Highlights [link]
    [4] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 1 [link]
    [5] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 2 [link]
    [6] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 3 [link]
    [7] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 4 [link]   
    [8] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 5 [link]
    [9] Microsoft Learn (2024) Dynamics 365: Business performance analytics introduction [link

    Acronyms:
    AI - Artificial Intelligence
    BPA - Business Performance Analytics
    D365 F&O - Dynamics 365 for Finance and Operations
    DaaS - Data-as-a-Service
    ETL - Extract, Transfer, Load
    GA - General Availability
    MF - Microsoft Fabric
    PP - Public Preview
    SaaS - Software-as-a-Service
    SKU - Stock Keeping Unit
    UTC - Coordinated Universal Time

    14 February 2025

    🏭🧊🗒️Microsoft Fabric: Partitions in Lakehouses [Notes]

     Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 14-Feb-2024

    [Microsoft Fabric] Partitions

    • {def} a data organization technique used to split a large dataset into smaller, more manageable nonoverlapping subsets (aka partitions, shards
      • a pattition is defined based on one or more fields
      • each partition contains a subset of the data
      • each partitions can be stored and processed independently
    • {goal} improve performance, scalability, and manageability of large data tables
    • {benefit} allows to split large tables into smaller, manageable partitions based on specific criteria [2]
      • e.g., date ranges, regions, categories, entities
    • {benefit} allows to improve queries' performance as they can target specific partitions [2]
      • reduces the amount of data scanned [2]
      • improves queries' performance [2]
    • {benefit} allows for more efficient data loading [2]
    • {benefit} facilitates the management of big tables [2]
      • maintenance tasks can be performed on individual partitions  [2]
      • obsolete data partitions can be removed with no overhead, adding new partitions on a need basis [2]
    • applies to 
      • backups
      • indexing
      • allows optimizing query performance for specific subsets of data
      • statistics
    • performance can be affected by
      • the choice of partition columns for a delta table [1]
      • the number and size of partitions of the partition column [1]
      • a column with high cardinality (mostly or entirely made of unique values) results in a large number of partitions [1]
        • ⇐ negatively impacts performance of the metadata discovery scan for changes [1]
        • {recommendation} if the cardinality of a column is high, choose another column for partitioning [1]
      • the size of each partition can also affect performance
        • {recommendation} use a column that would result in a partition of at least (or close to) 1 GB [1]
        • {recommendation} follow the best practices for delta tables maintenance [1]
        • a large volume of small-sized parquet files increases the time it takes to sync the changes [1]
          • ⇒ leads to large number of parquet files in a delta table due to [1]
            • over-partitioning
              • partitions with high number of unique values [1]
              • {recommendation} choose a partition column that 
                • doesn't have a high cardinality [1]
                • results in individual partition size of at least 1 GB [1]
            • many small files
              • batch and streaming data ingestion rates might result in small files
                • depends on the frequency and size of changes being written to a lakehouse [1]
              • {recommendation} implement regular lakehouse table maintenance [1] 


    References:
    [1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
    [2] Kenneth A Omorodion (2024) Partitioning Data in Microsoft Fabric to Improve Performance
    written [link]
    [3] Microsoft Learn (2024) Microsoft Fabric: Loading Fabric Lakehouse Tables with partitions [link]
    [4] 

    Resources
    [R1] Microsoft Learn (2024) Microsoft Fabric: Load data to Lakehouse using partition in a Data pipeline [link]
    [R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    13 February 2025

    🧊💠🗒️Data Warehousing: Table Partitioning in SQL Server [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes.

    Last updated: 13-Feb-2025

    [Data Warehousing] Table Partitioning

    • {defthe spreading of data across multiple tables based on a set of rules to balance large amounts of data across disks or nodes
      • data is distributed based on a function that defines a range of values for each partition [2] 
        • the table is partitioned by applying the partition scheme to the values in a specified column [2]
    • {operation} partition creation
      • [large partitioned table]
        • should be created two auxiliary nonindexed empty tables with the same structure, including constraints and data compression options [4]
          • first table:  create a check constraint that guarantees that all data from the table fits exactly with one empty partition of the fact table
            • the constraint must be created on the partitioning column [4]
            • a columnstore index can be created on the fact table, as long as it is aligned with the table [4]
            • after truncation of <table 2> the <table 1> is prepared to accept the next partition from your fact table for the next minimally logged deletion [4]
          • second table: for minimally logged deletions of large portions of data, a partition from the fact table can be switched to the empty table version without the check constraint [4]
            • then the table can be truncated
        • for minimally logged inserts, new data to the second auxiliary table should be bulk inserted in the auxiliary that has the check constraint [4]
          • INSERT operation can be minimally logged because the table is empty [4]
          • create a columnstore index on this auxiliary table, using the same structure as the columnstore index on your fact table [4]
          • switch data from this auxiliary table to a partition of your fact table [4]
          • drop the columnstore index on the auxiliary table, and change the check constraint to guarantee that all of the data for the next load can be switched to the next empty partition of the fact table [4]
          • the second auxiliary table is prepared for new bulk loads again [4]
    • {operation} [Query Optimizer] partition elimination 
      • process in which SQL Server accesses only those partitions needed to satisfy query filters [4]
    • {operation} partition switching 
      • {definition} process that switches a block of data from one table or partition to another table or partition [4]
      • types of switches
        • reassign all data from a nonpartitioned table to an empty existing partition of a partitioned table [4]
        • switch a partition of one partitioned table to a partition of another partitioned table [4]
        • reassign all data from a partition of a partitioned table to an existing empty nonpartitioned table [4]
    • {benefit} improves query performance [1]
      • by partitioning a table across filegroups [1]
        • specific ranges of data can be placed on different disk spindles [1]
          • can improve I/O performance [1]
            • ⇐ the disk storage is already configured as a RAID 10 or RAID 5 array [1]
              • ⇒ this usually has little benefit [1]
        • using a mix of fast solid state storage for recent, frequently accessed data, and mechanical disks for older, less queried rows [1]
          • use partitioning to balance disk performance against storage costs [1]
        • biggest performance gain from partitioning in a data warehouse is realized when queries return a range of rows that are filtered on the partitioning key [1]
          • the query optimizer can eliminate partitions that are not within the filter range [1]
            • dramatically reduce the number of rows that need to be read [1]
      • reduces contention [3]
        • can reduce the number of rows included in a table scan [3]
    • {benefit} more granular manageability [1]
      • some maintenance operations can be performed at partition level instead of on the whole table [1]
        • e.g. indexes can be created and rebuilt on a per-partition basis [1]
        • e.g. compression can be applied to individual partitions [1]
        • e.g. by mapping partitions to filegroups, partitions can be backed up and restored independently [1]
          • enables to back up older data once and then configure the backed up partitions as read-only [1]
          • future backups can be limited to the partitions that contain new or updated data [1]
    • {benefit} improved data load performance
      • enables loading many rows very quickly by switching a staging table with a partition
        • can dramatically reduce the time taken by ETL data loads [1]
          • with the right planning, it can be achieved with minimal requirements to drop or rebuild indexes [1]
    • {best practice} partition large fact tables
      • tables of around 50 GB or more
      • ⇐ in general, fact tables benefit from partitioning more than dimension tables [1]
    • {best practice} partition on an incrementing date key [1]
      • assures that the most recent data are in the last partition and the earliest data are in the first partition [1]
    • {best practice} design the partition scheme for ETL and manageability [1]
      • the query performance gains realized by partitioning are small compared to the manageability and data load performance benefits [1]
        • ideally partitions should reflect the ETL load frequency
          • because this simplifies the load process [1]
          • merge partitions periodically to reduce the overall number of partitions (for example, at the start of each year [1]
        • could merge the monthly partitions for the previous year into a single partition for the whole year [1]
    • {best practice} maintain an empty partition at the start and end of the table [1]
      • simplifies the loading of new rows [1]
      • when new records need to be added, split the empty partition 
        • ⇐ to create two empty partitions)
      • then switch the staged data with the first empty partition [1]
        • ⇐ loads the data into the table and leaves the second empty partition you created at the end of the table, ready for the next load [1]
      • a similar technique can be used to archive or delete obsolete data at the beginning of the table [1]
    • {best practice} chose the proper granularity
      • it should be aligned to the business requirements [2]
    • {best practice} create at least one filegroup in addition to the primary one
      • set it as the default filegroup
        • data tables are thus separated from system tables [2]
      • creating dedicated filegroups for extremely large fact tables [2]
        • place the fact tables on their own logical disks [2]
      • use a file and a filegroup separate from the fact and dimension tables [2]
        • {exception} staging tables that will be switched with partitions to perform fast loads [2]
          • staging tables must be created on the same filegroup as the partition with which they will be switched [2]
    • {def} partition scheme 
      • a scheme that maps partitions to filegroups
    • {def} partition function 
      • object that maps rows to partitions by using values from specific columns (aka  partitioning columns)
      • performs logical mapping
    • {def} aligned index 
      • index built on the same partition scheme as its base table [4]
        • if all indexes are aligned with their base table, switching a partition is a metadata operation only [4]
          • ⇒ it’s very fast [4]
    Previous Post <<||>> Next Post

    References:
    [1] 20467A - Designing Business Intelligence Solutions with Microsoft SQL Server 2012
    [2] 20463C - Implementing a Data Warehouse with Microsoft SQL Server
    [3] 10777A - Implementing a Data Warehouse with Microsoft SQL Server 2012
    [4] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)
    [5] Microsoft Learn (2009) How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005 [link]

    🏭💠🗒️Microsoft Fabric: SQL Analytics Endpoint [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 12-Feb-2024

    [Microsoft Fabric] SQL Analytics Endpoint

    • {def} a service that listens actively for requests providing a SQL-based experience for lakehouse Delta tables [1]
      • enables to query data in the lakehouse using T-SQL language and TDS protocol [1]
      • created by default for each lakehouses and mirrored databases provisioned in a workspace
        • every lakehouse has one SQL analytics endpoint [1]
      • manages the automatically generated tables so the workspace users can't modify them [1]
    • {feature} a background process is responsible for 
      • scanning lakehouse for changes [1]
        • automatically generates a table in the appropriate schema for every Delta table in the Lakehouse
          • tables are created with a minor delay [1]
          • data is refreshed automatically
            • the amount of time it takes to refresh the table is related to how optimized the Delta tables are [1]
          • {alternative} one can manually force a refresh of the automatic metadata scanning in the Fabric portal [1]
      • keeping SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a workspace [1]
        • {characteristic} transparently managed
          • ⇐ the sync process is transparently managed by Microsoft Fabric platform [1]
          • when a change is detected in the lakehouse
            • a background process updates metadata [1]
            • the SQL analytics endpoint reflects the changes committed to lakehouse tables [1]
        • {characteristic} low latency
          • under normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is less than one minute [1]
          • the actual length of time can vary from a few seconds to minutes [1]
        • the metadata synchronization is automatically triggered when the SQL Endpoint is opened [3]
          • ensures the SQL Analytics Endpoint remains up to date without the need for manual syncs [3]
            •  ⇐ provides a seamless experience [3]
    • {feature} can be enriched by adding database objects
      • schemas
      • views
      • procedures
      • other database objects
    • {feature} automatic metadata discovery
      • tracks changes committed to lakehouses [1]
        • is a single instance per Fabric workspace [1]
        • {issue} increased latency for changes to sync between lakehouses and SQL analytics endpoint, it could be due to large number of lakehouses in one workspace [1]
          • {resolution} migrate each lakehouse to a separate workspace [1]
            • allows automatic metadata discovery to scale [1]
        • {issue} changes committed to a lakehouse are not visible in the associated SQL analytics endpoint
          • create a new table in the  [1]
          • many committed rows may not be visible in the endpoint [1]
          • ⇐ may be impacted by ETL processing that generate large volumes of changes
          • {recommendation} initiate an on-demand metadata sync, triggered from the SQL query editor Refresh ribbon option [1]
            • forces an on-demand metadata sync, rather than waiting on the background metadata sync to finish [1]
        • {issue} if there's no maintenance scheduled for the parquet files, this can result in read overhead and this impacts time it takes to sync changes to SQL analytics endpoint [1]
          • {recommendation}schedule regular lakehouse table maintenance operations [1]
        • {limitation} not all Delta features are understood by the automatic sync process [1]
    • {feature} endpoint reprovisioning
      • if the initial provisioning attempt fails, users have the option to try again [2]
        • ⇐ without the need to create an entirely new lakehouse [2]
        • {benefit} empowers users to self-mitigate provisioning issues in convenient way in the UI avoiding the need for complete lakehouse re-creation [2]

    References:
    [1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
    [2] Microsoft Learn (2024) Microsoft Fabric: What is the SQL analytics endpoint for a lakehouse? [link]
    [3] Microsoft Learn (2024) Microsoft Fabric: What’s new in the Fabric SQL Analytics Endpoint? [link]

    Resources:
    [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    06 January 2025

    💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VII: Things That Don't Work) 🆕

    Microsoft does relatively a good job in documenting what doesn't work in Microsoft Fabric's SQL Databases. There's a good overview available already in the documentation, though beyond this the current post lists my finding while testing the previously written code on this blog,

    USE Database

    The standard syntax allows to change via USE the database context to the specified database or database snapshot. Unfortunately, this syntax doesn't seem to be supported currently and unfortunately many scripts seem to abuse of it. Thus, the following line of code throws an error:

    -- changing the context
    USE master;
    GO
    USE tempdb;
    

    "Msg 40508, Level 16, State 1, Line 1, USE statement is not supported to switch between databases. Use a new connection to connect to a different database"

    However, one can use the 3-part naming convention to reference the various objects:

    -- sys metadata - retrieving the database files
    
    SELECT *
    FROM tempdb.sys.database_files dbf
    ORDER BY name;

    Even if the tempdb is not listed in the sys.databases table, it's still available for querying, which can prove helpful for troubleshooting. 

    DBCC commands 

    The documentation warns that some DBCC commands won't work, though in some cases there are also alternatives. For example:

    -- clearing the procedure cache via DBCC
    DBCC FREEPROCCACHE;
    
    Output:
    "Msg 2571, Level 14, State 9, Line 1, User '<user>' does not have permission to run DBCC freeproccache."

    Alternatively, one can use the following command, which seems to work:

    -- clearing the procedure cash via ALTER
    ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;

    CHECKDB, which checks the logical and physical integrity of all the objects in the specified database, can't be used as well:
     
    -- Checking the logical and physical integrity of a database
    DBCC CHECKDB();
    
    Output:
    "Msg 916, Level 14, State 2, Line 1, The server principal "..." is not able to access the database "..." under the current security context."
    The same error message is received for CHECKTABLE, utility which checks the integrity of all the pages and structures that make up the table (or indexed view):

    -- checking a table's integrity
    DBCC CHECKTABLE ('SalesLT.Address');
    
    Output:
    "Msg 916, Level 14, State 2, Line 2, The server principal "..." is not able to access the database "..." under the current security context."
    A similar error messages is received for SQLPERF, which provides transaction log space usage statistics for all databases:

    -- retrieving the LOGSPACE information for all databases
    DBCC SQLPERF (LOGSPACE);
    
    Output: 
    "Msg 297, Level 16, State 10, Line 1, The user does not have permission to perform this action."
    There are however DBCC commands like SHOW_STATISTICS or SHRINKDATABASE which do work. 
     
    -- current query optimization statistics
    DBCC SHOW_STATISTICS('SalesLT.Address','PK_Address_AddressID');
    
    Output:
    Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows Persisted Sample Percent
    PK_Address_AddressID Dec 21 2024 3:02AM 450 450 197 1 4 NO 450 0

    SHRINKDATABASE shrinks the size of the data and log files in the specified database:

    -- shrinking database
    DBCC SHRINKDATABASE([AdventureWorks01-...]) WITH NO_INFOMSGS;
    

    Update 29-Jan-2025: According to an answer from Ask the Expert session on Fabric Database [3], Microsoft seems to be working in bringing more DBCC features to SQL databases.

    Happy coding!

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2024) SQL Server: USE <database> [link]
    [2] Microsoft Learn (2024) Database console commands [link]
    [3] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.