SQL Troubles: performance

Showing posts with label performance. Show all posts

11 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 18: How Fast Can AI Surpass Human Intelligence?)

Prompt Engineering Series

Prompt: "write a blogpost of 600 words on how fast AI can surpass human intelligence"

Introduction

Artificial Intelligence (AI) has made remarkable progress in recent years, outperforming humans in various specialized tasks. From chess-playing algorithms to medical diagnostics, AI is advancing at an unprecedented rate. But how fast can AI truly surpass human intelligence? While AI excels in pattern recognition, automation, and data processing, achieving human-level intelligence - or even surpassing it - remains a complex challenge.

1. AI Already Outperforms Humans in Specific Domains

AI has already surpassed human capabilities in several areas:

Game-playing AI: AI systems like DeepMind’s AlphaGo defeated world champions in Go, a game requiring strategic thinking.
Medical diagnostics: AI-powered imaging tools detect diseases with higher accuracy than human doctors.
Financial predictions: AI-driven algorithms analyze market trends faster than human analysts.

However, these achievements are task-specific - AI does not possess general intelligence comparable to human reasoning.

2. Predictions on AI Surpassing Human Intelligence

Experts estimate that human-level machine intelligence (HLMI) has a 50% chance of occurring within 45 years and a 10% chance within 9 years. While AI is advancing rapidly, achieving true general intelligence requires breakthroughs in reasoning, creativity, and emotional intelligence.

3. AI’s Rapid Learning and Self-Improvement

AI’s ability to learn and improve autonomously accelerates its growth:

Machine learning algorithms refine themselves based on vast datasets.
AI models optimize decision-making without human intervention.
Quantum computing could further accelerate AI’s problem-solving abilities.

Despite these advancements, AI still struggles with abstract thinking, ethical reasoning, and emotional intelligence - key aspects of human cognition.

4. AI’s Performance Benchmarks Are Constantly Evolving

AI has surpassed humans in image classification, reading comprehension, and natural language inference. However, researchers are developing new benchmarks to assess AI’s capabilities beyond simple tasks.

For example, AI’s ability to solve complex math problems has improved dramatically - from solving 6.9% of problems in 2021 to 84.3% in 2023. While AI is catching up, humans still maintain an edge in intuitive reasoning and creativity.

5. The Challenge of Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) refers to AI that can think, learn, and adapt across multiple domains, similar to human intelligence. While AI has made strides in specialized intelligence, AGI remains a distant goal.

Challenges in achieving AGI include:

Understanding context and emotions: AI lacks human-like empathy and intuition.
Creativity and abstract thinking: AI can generate content but struggles with originality.
Ethical decision-making: AI lacks moral reasoning and accountability.

Conclusion: AI’s Growth Is Rapid, But Human Intelligence Remains Unique

AI is advancing faster than ever, surpassing humans in specific tasks but still falling short in general intelligence. While AI may outperform humans in computation and automation, achieving true human-like intelligence requires breakthroughs in reasoning, creativity, and ethical understanding.

The question isn’t just how fast AI can surpass human intelligence, but whether AI will ever fully replicate the depth of human cognition.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 17-Mar-2024

[Microsoft Fabric] Z-Order

{def} technique to collocate related information in the same set of files [2]

⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
{benefit} efficient query performance

reduces the amount of data to read [2] for certain queries

when the data is appropriately ordered, more files can be skipped [3]
particularly important for the ordering of multiple columns [3]

{benefit} data skipping

automatically skips irrelevant data, further enhancing query speeds

via data-skipping algorithms [2]

{benefit} flexibility

can be applied to multiple columns, making it versatile for various data schemas

aims to produce evenly-balanced data files with respect to the number of tuples

⇐ but not necessarily data size on disk [2]

⇐ the two measures are most often correlated [2]

⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]

via ZORDER BY clause

applicable to columns with high cardinality commonly used in query predicates [2]
multiple columns can be specified as a comma-separated list

{warning} the effectiveness of the locality drops with each extra column [2]

has tradeoffs

it’s important to analyze query patterns and select the right columns when Z Ordering data [3]

{warning} using columns that do not have statistics collected on them is ineffective and wastes resources [2]

statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]

{characteristic} not idempotent

every time is executed, it will try to create a new clustering of data in all files in a partition [2]

it includes new and existing files that were part of previous z-ordering [2]

compatible with v-order

{concept} [Databricks] liquid clustering

replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]

not compatible with the respective features [4] [6]

tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
provides flexibility to redefine clustering keys without rewriting existing data [4] [6]

⇒ allows data layout to evolve alongside analytic needs over time [4] [6]

applies to

streaming tables
materialized views

{scenario} tables often filtered by high cardinality columns [4] [6]
{scenario} tables with significant skew in data distribution [4] [6]
{scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
{scenario} tables with concurrent write requirements [4] [6]
{scenario} tables with access patterns that change over time [4] [6]
{scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]

Previous Post <<||>> Next Post

References:

[1] Bennie Haelen & Dan Davis (2024) Delta Lake Up & Running: Modern Data Lakehouse Architectures with Delta Lake

[2] Delta Lake (2023) Optimizations [link]

[3] Delta Lake (2023) Delta Lake Z Order, by Matthew Powers [link]
[4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
[5] Databricks (2025) Delta Lake table format interoperability [link]
[6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]

Resources:
[R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]

[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
MF - Microsoft Fabric

🏭🗒️Microsoft Fabric: V-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 17-Mar-2024

[Microsoft Fabric] V-Order

{def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]

all parquet engines can read the files as regular parquet files [2]
results in a smaller and therefore faster files to read [5]

{benefit} improves read performance
{benefit} decreases storage requirements
{benefit} optimizes resources' usage

reduces the compute resources required for reading data

e.g. network bandwidth, disk I/O, CPU usage

still conforms to the open-source Parquet file format [5]

they can be read by non-Fabric tools [5]

delta tables created and loaded by Fabric items automatically apply V-Order

e.g. data pipelines, dataflows, notebooks [5]

delta tables and its features are orthogonal to V-Order [2]

e.g. Z-Order, compaction, vacuum, time travel
table properties and optimization commands can be used to control the v-order of the partitions [2]

compatible with Z-Order [2]
not all files have this optimization applied [5]

e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut
the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]

required by certain features

[hash encoding] to assign a numeric identifier to each unique value contained in the column [5]

{command} OPTIMIZE

optimizes a Delta table to coalesce smaller files into larger ones [5]
can apply V-Order to compact and rewrite the Parquet files [5]

[warehouse]

works by applying certain operations on Parquet files

special sorting
row group distribution
dictionary encoding
compression

enabled by default
⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]

provides cost efficiency and performance [1]

the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]

fully-compliant to the open-source parquet format [1]

⇐ all parquet engines can read it as regular parquet files [1]

required by certain features

[Direct Lake mode] depends on V-Order

{operation} disable V-Order

causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
irreversible operation

once disabled, it cannot be enabled again [3]

{scenario} write-intensive warehouses

warehouses dedicated to staging data as part of a data ingestion process [1]

{warning} consider the effect of V-Order on performance before deciding to disable it [1]

{recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]

via ALTER DATABASE CURRENT SET VORDER = OFF; [3]

{operation} check current status

via SELECT name, is_vorder_enabled FROM sys.databases; [post]

{feature} [lakehouse] Load to Table

allows to load a single file or a folder of files to a table [6]
tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]

[Direct Lake semantic model]

data is prepared for fast loading into memory [5]

makes less demands on capacity resources [5]
results in faster query performance [5]

because less memory needs to be scanned [5]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
[2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
[3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
[4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
[5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]

[6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]

Resources:
[R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
[R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
[R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]

[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
MF - Microsoft Fabric

15 March 2025

💫🗒️ERP Systems: Microsoft Dynamics 365's Business Performance Analytics (BPA) [notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

Last updated: 15-Mar-2025

[Dynamics 365] Business Performance Analytics (BPA)

{def} centralized reporting hub within D365 F&O designed to streamline insights and help organizations make faster, data driven decisions [3]

solution designed to transform organization's data into actionable insights [1]
provides an intuitive out-of-box data model along with familiar tools like Microsoft Excel and Power BI for self-service analytics [4]

data extracted from D365 is classified in BPA in the form of value chains

⇐ a group of business processes on top of the value chain [4]

{benefit} allows to simplify data insights by providing a unified view of business data across entities in near real time [4]
{benefit} allows to streamline financial and operations reporting to reduce the cycle times [4]
{benefit} allows users of all technical abilities to quickly access and analyze data to facilitate data driven decisions [4]
{benefit} provides auditors with direct access to financial data, making the audit process more efficient
{benefit} enables ease of use through familiar apps like Excel and Power BI, in addition to AI driven insights and automation in this platform that can be scalable and extendable [4]
{feature} extends into Microsoft Fabric

{benefit} provide a scalable, secure environment for handling large data sets and ensuring insights are always powered by the latest technology [3]

{feature} ETL process

involves extracting data from finance and operations database, transforming and loading it into Dataverse [4]

each of the entities required for the generation of the dimensional model for the value chains that were mentioned earlier, they are backed by the underlying tables in finance and operations database [4]

installed in Dataverse, virtual entities that are created will then pull in the data into the managed data lake [4]
the data is then transformed to generate the dimensional model which is then pushed into the embedded Power BI workspace in the form of analytical tables [4]
BPA consumes this data from Power BI workspace to render the power BI reports [4]
this data can also be extended to Fabric if there is a need to consolidate data from multiple sources [4]

{feature} reports

designed to provide a detailed overview of an organization's financial health [8]
further reports will be added to expand the coverage for the value chains [8]
out-of-box reports can't be modified

⇐ users cannot rename, delete or edit these type of reports [8]
there’s the option to duplicate the base report and edit the version thus created [8]

can be shared with other users who have access to BPA

⇐ they can receive an in-app notification [8]
can be shared over email with another user by entering user’s email address [8]
one can configure whether the recipient can edit or view the report [8]

{feature} allows to create a new Power BI or Excel report from scratch [8]

{option} start with a blank report or duplicate an existing report [8]

{feature} data refresh

automatic data refreshes run currently two times a day [4]

at 12:00 AM and 12:00 PM UTC
the volume of data is also constrained by the storage capacity of the A3 SKU for Power BI Embedded [1]

future release, may support additional data reporting capacity [1]

⇐ so that larger data sets can be reported and analyzed [1]

the target is to have refreshes every hour or less [3]

data volume will be initially for about eight quarters of data [4]
extensibility will be supported with bring your own Fabric [4]

architecture

SaaS solution

{capability} immediate deployment

businesses can start to analyze data and generate insights with minimal setup [1]

{capability} comprehensive reporting and dashboards

provides access to a wide range of preconfigured reports that cover multiple business functions [1]

{capability} near-real-time analytics

future releases will offer more frequent data refreshes to enable near-real-time data analysis and reporting

{capability} predictive insights

future releases will introduce predictive analytics capabilities that enable businesses to

forecast trends
identify risks
seize opportunities [1]

{capability} user-friendly interface

intuitive design ⇒ minimal training

fosters broader adoption
enables a data-driven culture across the organization [1]

{capability} cost-effectiveness

available as part of D365 license

⇒ provides advanced analytics without requiring significant investments in IT infrastructure [1]

DaaS solution

{capability} organizations can integrate its data models with their existing data warehousing infrastructure in Microsoft Fabric [1]

maximizes the value of existing data solutions [1]
positions businesses for future enhancements [1]

{capability} unified and scalable data models

customers can build custom models on top of a unified framework

ensures consistency and scalability across data sets [1]

{capability} future-proofing with automatic upgrades

data models integrate seamlessly with future D365 updates

reduces manual maintenance and ensures access to the latest features [1]

{capability} consistency and standardization

data models provide consistency and standardization across data sources

ensure high data quality and integrity [1]

{capability} advanced analytics and AI

by customizing the data models, organizations can take advantage of advanced analytics and AI capabilities [1]

deeper insights without having to develop them from scratch [1]

{capability} enhanced data governance

unified data models support better data governance by providing standardized data definitions, relationships, and hierarchies [1]

ensure consistency and quality across the organization [1]

requires an integrated Power Platform environment [5]

must be integrated with the Microsoft Entra tenant [5]

uses shared Dataverse entitlements [1]

includes access to the data lake [1]

setup

dimensions

the selection of dimensions might affect the dimension groups that are created using these dimensions and the users who are assigned there [7]

e.g. legal entity, business unit

dimension groups

users can select specific values for the legal entity, or add a range of values [7]

selecting an invalid combination of dimension values, the dimension group will filter out all the records on the report [7]

{warning} assigning too many dimension groups to a user, slows the load for that user [7]

roles

determine which reports the user can access [7]

security

secure data through role-based access control on top of the value chains [7]
the first user who signs into the app is assigned the BPA admin role [7]

allows a user to access the administrator section of the BPA [7]

where the security can be set up [7]

has automatically assigned

Microsoft report viewer role
the All Access Dimension group [7]

allow the admin to see the data in all the reports across all the dimensions [7]

{feature} dimension-based role-level security

ensures that users only see the data relevant to them based on their role

confidently share reports without duplicating them

⇐ data is automatically filtered by organization's security policies [3]

simple but powerful way to maintain control while providing access for teams that love working in Excel [3]

accessibility

can be accessed through either

Power Platform

admins can access BPA app through PowerApps' makeup portal [6]

Dynamics 365

through the BPA preview shortcut in the homepage or the default dashboard [6]
for end users, the BPA preview shortcut is provided when they have certain duties associated to their role(s) [6]

licensing

included in D365 F&O license [4]

requirements

requires a tier two environment and Dynamics 365 finance version 1.0.38 or later [5]

{project} timeline

[2025 wave 1] backup and restore custom reports and analytics

{benefit} support better lifecycle management and empower customers to develop on sandbox instances before publishing to production [3]

2025: available in all regions where F&O is available [3]
Oct-2024: GA

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Dynamics 365 Finance: What is Business performance analytics? [link]
[2] Microsoft Learn (2025) Business performance analytics (BPA) with Dynamics 365 Finance [link]
[3] Dynamics 365 Finance - Business Performance Analytics 2025 Release Wave 1 Release Highlights [link]
[4] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 1 [link]
[5] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 2 [link]
[6] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 3 [link]
[7] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 4 [link]
[8] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 5 [link]
[9] Microsoft Learn (2024) Dynamics 365: Business performance analytics introduction [link]

Acronyms:
AI - Artificial Intelligence
BPA - Business Performance Analytics
D365 F&O - Dynamics 365 for Finance and Operations
DaaS - Data-as-a-Service
ETL - Extract, Transfer, Load
GA - General Availability
MF - Microsoft Fabric
PP - Public Preview
SaaS - Software-as-a-Service
SKU - Stock Keeping Unit
UTC - Coordinated Universal Time

14 February 2025

🏭🧊🗒️Microsoft Fabric: Partitions in Lakehouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 14-Feb-2024

[Microsoft Fabric] Partitions

{def} a data organization technique used to split a large dataset into smaller, more manageable nonoverlapping subsets (aka partitions, shards)

a pattition is defined based on one or more fields
each partition contains a subset of the data
each partitions can be stored and processed independently

{goal} improve performance, scalability, and manageability of large data tables
{benefit} allows to split large tables into smaller, manageable partitions based on specific criteria [2]

e.g., date ranges, regions, categories, entities

{benefit} allows to improve queries' performance as they can target specific partitions [2]

reduces the amount of data scanned [2]
improves queries' performance [2]

{benefit} allows for more efficient data loading [2]
{benefit} facilitates the management of big tables [2]

maintenance tasks can be performed on individual partitions [2]
obsolete data partitions can be removed with no overhead, adding new partitions on a need basis [2]

applies to

backups
indexing
allows optimizing query performance for specific subsets of data
statistics

performance can be affected by

the choice of partition columns for a delta table [1]
the number and size of partitions of the partition column [1]
a column with high cardinality (mostly or entirely made of unique values) results in a large number of partitions [1]

⇐ negatively impacts performance of the metadata discovery scan for changes [1]
{recommendation} if the cardinality of a column is high, choose another column for partitioning [1]

the size of each partition can also affect performance

{recommendation} use a column that would result in a partition of at least (or close to) 1 GB [1]
{recommendation} follow the best practices for delta tables maintenance [1]
a large volume of small-sized parquet files increases the time it takes to sync the changes [1]

⇒ leads to large number of parquet files in a delta table due to [1]

over-partitioning

partitions with high number of unique values [1]
{recommendation} choose a partition column that

doesn't have a high cardinality [1]
results in individual partition size of at least 1 GB [1]

many small files

batch and streaming data ingestion rates might result in small files

depends on the frequency and size of changes being written to a lakehouse [1]

{recommendation} implement regular lakehouse table maintenance [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]

[2] Kenneth A Omorodion (2024) Partitioning Data in Microsoft Fabric to Improve Performance

written [link]

[3] Microsoft Learn (2024) Microsoft Fabric: Loading Fabric Lakehouse Tables with partitions [link]

[4]

Resources

[R1] Microsoft Learn (2024) Microsoft Fabric: Load data to Lakehouse using partition in a Data pipeline [link]

[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

13 February 2025

🧊💠🗒️Data Warehousing: Table Partitioning in SQL Server [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes.

Last updated: 13-Feb-2025

[Data Warehousing] Table Partitioning

{def} the spreading of data across multiple tables based on a set of rules to balance large amounts of data across disks or nodes

data is distributed based on a function that defines a range of values for each partition [2]

the table is partitioned by applying the partition scheme to the values in a specified column [2]

{operation} partition creation

[large partitioned table]

should be created two auxiliary nonindexed empty tables with the same structure, including constraints and data compression options [4]

first table: create a check constraint that guarantees that all data from the table fits exactly with one empty partition of the fact table

the constraint must be created on the partitioning column [4]
a columnstore index can be created on the fact table, as long as it is aligned with the table [4]
after truncation of <table 2> the <table 1> is prepared to accept the next partition from your fact table for the next minimally logged deletion [4]

second table: for minimally logged deletions of large portions of data, a partition from the fact table can be switched to the empty table version without the check constraint [4]

then the table can be truncated

for minimally logged inserts, new data to the second auxiliary table should be bulk inserted in the auxiliary that has the check constraint [4]

INSERT operation can be minimally logged because the table is empty [4]
create a columnstore index on this auxiliary table, using the same structure as the columnstore index on your fact table [4]
switch data from this auxiliary table to a partition of your fact table [4]
drop the columnstore index on the auxiliary table, and change the check constraint to guarantee that all of the data for the next load can be switched to the next empty partition of the fact table [4]
the second auxiliary table is prepared for new bulk loads again [4]

{operation} [Query Optimizer] partition elimination

process in which SQL Server accesses only those partitions needed to satisfy query filters [4]

{operation} partition switching

{definition} process that switches a block of data from one table or partition to another table or partition [4]
types of switches

reassign all data from a nonpartitioned table to an empty existing partition of a partitioned table [4]
switch a partition of one partitioned table to a partition of another partitioned table [4]
reassign all data from a partition of a partitioned table to an existing empty nonpartitioned table [4]

{benefit} improves query performance [1]

by partitioning a table across filegroups [1]

specific ranges of data can be placed on different disk spindles [1]

can improve I/O performance [1]

⇐ the disk storage is already configured as a RAID 10 or RAID 5 array [1]

⇒ this usually has little benefit [1]

using a mix of fast solid state storage for recent, frequently accessed data, and mechanical disks for older, less queried rows [1]

use partitioning to balance disk performance against storage costs [1]

biggest performance gain from partitioning in a data warehouse is realized when queries return a range of rows that are filtered on the partitioning key [1]

the query optimizer can eliminate partitions that are not within the filter range [1]

dramatically reduce the number of rows that need to be read [1]

reduces contention [3]

can reduce the number of rows included in a table scan [3]

{benefit} more granular manageability [1]

some maintenance operations can be performed at partition level instead of on the whole table [1]

e.g. indexes can be created and rebuilt on a per-partition basis [1]
e.g. compression can be applied to individual partitions [1]
e.g. by mapping partitions to filegroups, partitions can be backed up and restored independently [1]

enables to back up older data once and then configure the backed up partitions as read-only [1]
future backups can be limited to the partitions that contain new or updated data [1]

{benefit} improved data load performance

enables loading many rows very quickly by switching a staging table with a partition

can dramatically reduce the time taken by ETL data loads [1]

with the right planning, it can be achieved with minimal requirements to drop or rebuild indexes [1]

{best practice} partition large fact tables

tables of around 50 GB or more
⇐ in general, fact tables benefit from partitioning more than dimension tables [1]

{best practice} partition on an incrementing date key [1]

assures that the most recent data are in the last partition and the earliest data are in the first partition [1]

{best practice} design the partition scheme for ETL and manageability [1]

the query performance gains realized by partitioning are small compared to the manageability and data load performance benefits [1]

ideally partitions should reflect the ETL load frequency

because this simplifies the load process [1]
merge partitions periodically to reduce the overall number of partitions (for example, at the start of each year [1]

could merge the monthly partitions for the previous year into a single partition for the whole year [1]

{best practice} maintain an empty partition at the start and end of the table [1]

simplifies the loading of new rows [1]
when new records need to be added, split the empty partition

⇐ to create two empty partitions)

then switch the staged data with the first empty partition [1]

⇐ loads the data into the table and leaves the second empty partition you created at the end of the table, ready for the next load [1]

a similar technique can be used to archive or delete obsolete data at the beginning of the table [1]

{best practice} chose the proper granularity

it should be aligned to the business requirements [2]

{best practice} create at least one filegroup in addition to the primary one

set it as the default filegroup

data tables are thus separated from system tables [2]

creating dedicated filegroups for extremely large fact tables [2]

place the fact tables on their own logical disks [2]

use a file and a filegroup separate from the fact and dimension tables [2]

{exception} staging tables that will be switched with partitions to perform fast loads [2]

staging tables must be created on the same filegroup as the partition with which they will be switched [2]

{def} partition scheme

a scheme that maps partitions to filegroups

{def} partition function
- object that maps rows to partitions by using values from specific columns (aka partitioning columns)
- performs logical mapping
{def} aligned index

index built on the same partition scheme as its base table [4]

if all indexes are aligned with their base table, switching a partition is a metadata operation only [4]

⇒ it’s very fast [4]

Previous Post <<||>> Next Post

References:
[1] 20467A - Designing Business Intelligence Solutions with Microsoft SQL Server 2012

[2] 20463C - Implementing a Data Warehouse with Microsoft SQL Server

[3] 10777A - Implementing a Data Warehouse with Microsoft SQL Server 2012

[4] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)

[5] Microsoft Learn (2009) How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005 [link]

🏭💠🗒️Microsoft Fabric: SQL Analytics Endpoint [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 12-Feb-2024

[Microsoft Fabric] SQL Analytics Endpoint

{def} a service that listens actively for requests providing a SQL-based experience for lakehouse Delta tables [1]
- enables to query data in the lakehouse using T-SQL language and TDS protocol [1]
- created by default for each lakehouses and mirrored databases provisioned in a workspace
- manages the automatically generated tables so the workspace users can't modify them [1]
{feature} a background process is responsible for

scanning lakehouse for changes [1]

automatically generates a table in the appropriate schema for every Delta table in the Lakehouse

tables are created with a minor delay [1]
data is refreshed automatically

the amount of time it takes to refresh the table is related to how optimized the Delta tables are [1]

{alternative} one can manually force a refresh of the automatic metadata scanning in the Fabric portal [1]

keeping SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a workspace [1]

{characteristic} transparently managed

⇐ the sync process is transparently managed by Microsoft Fabric platform [1]
when a change is detected in the lakehouse

a background process updates metadata [1]
the SQL analytics endpoint reflects the changes committed to lakehouse tables [1]

{characteristic} low latency

under normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is less than one minute [1]
the actual length of time can vary from a few seconds to minutes [1]

the metadata synchronization is automatically triggered when the SQL Endpoint is opened [3]

ensures the SQL Analytics Endpoint remains up to date without the need for manual syncs [3]

⇐ provides a seamless experience [3]

{feature} can be enriched by adding database objects

schemas
views
procedures
other database objects

{feature} automatic metadata discovery

tracks changes committed to lakehouses [1]

is a single instance per Fabric workspace [1]
{issue} increased latency for changes to sync between lakehouses and SQL analytics endpoint, it could be due to large number of lakehouses in one workspace [1]

{resolution} migrate each lakehouse to a separate workspace [1]

allows automatic metadata discovery to scale [1]

{issue} changes committed to a lakehouse are not visible in the associated SQL analytics endpoint

create a new table in the [1]
many committed rows may not be visible in the endpoint [1]
⇐ may be impacted by ETL processing that generate large volumes of changes
{recommendation} initiate an on-demand metadata sync, triggered from the SQL query editor Refresh ribbon option [1]

forces an on-demand metadata sync, rather than waiting on the background metadata sync to finish [1]

{issue} if there's no maintenance scheduled for the parquet files, this can result in read overhead and this impacts time it takes to sync changes to SQL analytics endpoint [1]

{recommendation}schedule regular lakehouse table maintenance operations [1]

{limitation} not all Delta features are understood by the automatic sync process [1]

{feature} endpoint reprovisioning

if the initial provisioning attempt fails, users have the option to try again [2]

⇐ without the need to create an entirely new lakehouse [2]
{benefit} empowers users to self-mitigate provisioning issues in convenient way in the UI avoiding the need for complete lakehouse re-creation [2]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
[2] Microsoft Learn (2024) Microsoft Fabric: What is the SQL analytics endpoint for a lakehouse? [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What’s new in the Fabric SQL Analytics Endpoint? [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

06 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VII: Things That Don't Work) 🆕

Microsoft does relatively a good job in documenting what doesn't work in Microsoft Fabric's SQL Databases. There's a good overview available already in the documentation, though beyond this the current post lists my finding while testing the previously written code on this blog,

USE Database

The standard syntax allows to change via USE the database context to the specified database or database snapshot. Unfortunately, this syntax doesn't seem to be supported currently and unfortunately many scripts seem to abuse of it. Thus, the following line of code throws an error:

-- changing the context
USE master;
GO
USE tempdb;

"Msg 40508, Level 16, State 1, Line 1, USE statement is not supported to switch between databases. Use a new connection to connect to a different database"

However, one can use the 3-part naming convention to reference the various objects:

-- sys metadata - retrieving the database files

SELECT *

FROM tempdb.sys.database_files dbf

ORDER BY name;

Even if the tempdb is not listed in the sys.databases table, it's still available for querying, which can prove helpful for troubleshooting.

DBCC commands

The documentation warns that some DBCC commands won't work, though in some cases there are also alternatives. For example:

-- clearing the procedure cache via DBCC
DBCC FREEPROCCACHE;

Output:

"Msg 2571, Level 14, State 9, Line 1, User '<user>' does not have permission to run DBCC freeproccache."

Alternatively, one can use the following command, which seems to work:

-- clearing the procedure cash via ALTER
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;

CHECKDB, which checks the logical and physical integrity of all the objects in the specified database, can't be used as well:

-- Checking the logical and physical integrity of a database
DBCC CHECKDB();

Output:

"Msg 916, Level 14, State 2, Line 1, The server principal "..." is not able to access the database "..." under the current security context."

The same error message is received for CHECKTABLE, utility which checks the integrity of all the pages and structures that make up the table (or indexed view):

-- checking a table's integrity
DBCC CHECKTABLE ('SalesLT.Address');

Output:

"Msg 916, Level 14, State 2, Line 2, The server principal "..." is not able to access the database "..." under the current security context."

A similar error messages is received for SQLPERF, which provides transaction log space usage statistics for all databases:

-- retrieving the LOGSPACE information for all databases
DBCC SQLPERF (LOGSPACE);

Output:

"Msg 297, Level 16, State 10, Line 1, The user does not have permission to perform this action."

There are however DBCC commands like SHOW_STATISTICS or SHRINKDATABASE which do work.

-- current query optimization statistics
DBCC SHOW_STATISTICS('SalesLT.Address','PK_Address_AddressID');

Output:

Name	Updated	Rows	Rows Sampled	Steps	Density	Average key length	String Index	Filter Expression	Unfiltered Rows	Persisted Sample Percent
PK_Address_AddressID	Dec 21 2024 3:02AM	450	450	197	1	4	NO		450	0

SHRINKDATABASE shrinks the size of the data and log files in the specified database:

-- shrinking database
DBCC SHRINKDATABASE([AdventureWorks01-...]) WITH NO_INFOMSGS;

Update 29-Jan-2025: According to an answer from Ask the Expert session on Fabric Database [3], Microsoft seems to be working in bringing more DBCC features to SQL databases.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: USE <database> [link]
[2] Microsoft Learn (2024) Database console commands [link]
[3] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VI: Index Usage Analysis)🆕

There are several system dynamic management views (DMV) available in SQL Server, Azure SQL Server and now in SQL databases that allow to gather more information about indexes' fragmentation and usage. Let's look at the most important information available based on the indexes create in the previous posts. As the data were probably purged from the views, it's needed to run first the select queries based on the SalesLT.Product from the previous post. This step is important, otherwise the DMVs might return no records!

One starting point is to use the sys.dm_db_index_physical_stats DMV to look at the indexes' size and fragmentation information for a given table (or view). The table is used usually as starting point for analyzing indexes' fragmentation and then defragment the indexes with high fragmentation.

-- sys metadata - index & data size and fragmentation information for the data and indexes of the specified table or view
SELECT --db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name index_name
, IND.type_desc
, IPS.page_count
, IPS.record_count
, IPS.index_level
, Cast(IPS.avg_fragmentation_in_percent as decimal(10,2)) avg_fragmentation_perc
, Cast(IPS.avg_page_space_used_in_percent as decimal(10,2)) space_used_perc
--, IPS.*
FROM sys.indexes IND
     CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), IND.object_id, IND.index_id, NULL, 'DETAILED') IPS
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product');

Output:

index_name	type_desc	page_count	record_count	index_level	avg_fragmentation_perc	space_used_perc
PK_Product_ProductID	CLUSTERED	101	295	0	0.99	87.90
PK_Product_ProductID	CLUSTERED	1	101	1	0.00	16.20
AK_Product_rowguid	NONCLUSTERED	2	295	0	50.00	74.69
AK_Product_rowguid	NONCLUSTERED	1	2	1	0.00	0.59
AK_Product_ProductNumber	NONCLUSTERED	2	295	0	50.00	85.79
AK_Product_ProductNumber	NONCLUSTERED	1	2	1	0.00	0.49
AK_Product_Name	NONCLUSTERED	3	295	0	33.33	87.32
AK_Product_Name	NONCLUSTERED	1	3	1	0.00	1.67
IX_SalesLT_Product_Color	NONCLUSTERED	1	295	0	0.00	79.24
IX_SalesLT_Product_Color_Size	NONCLUSTERED	1	295	0	0.00	94.12
IX_SalesLT_Product_ListPrice_IC	NONCLUSTERED	4	295	0	0.00	86.60
IX_SalesLT_Product_ListPrice_IC	NONCLUSTERED	1	4	1	0.00	1.01

In a second step one can look at the sys.dm_db_index_usage_stats DMV which provides the counts of the different types of index operations and the time each type of operation was last performed:

-- sys metadata - counts of different types of index operations and the time each type of operation was last performed.
SELECT -- db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name
, IND.type_desc
, IUS.user_seeks 
, IUS.user_scans
, IUS.user_lookups 
, IUS.user_updates
, IUS.last_user_seek
, IUS.last_user_scan 
, IUS.last_user_lookup
, IUS.last_user_update
FROM sys.dm_db_index_usage_stats IUS
     JOIN sys.indexes IND
       ON IUS.index_id = IND.index_id
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product');

Output:

name	type_desc	user_seeks	user_scans	user_lookups	user_updates	last_user_seek	last_user_scan	last_user_lookup
PK_Product_ProductID	CLUSTERED	0	10	15	0		2025-01-06T14:23:54	2025-01-06T14:23:54
IX_SalesLT_Product_Color_Size	NONCLUSTERED	11	0	0	0	2025-01-06T14:23:54
IX_SalesLT_Product_ListPrice_IC	NONCLUSTERED	8	0	0	0	2025-01-06T13:38:03

Finally, it might be useful to look also at the sys.dm_db_index_operational_stats DMV which returns the current lower-level I/O, locking, latching, and access method activity for each partition of a table or index in the database (see the documentation for the full list of attrbutes):

-- sys metadata - index operations stats
SELECT -- db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name index_name
, IND.type_desc
, IOS.range_scan_count
, IOS.singleton_lookup_count
, IOS.leaf_insert_count
, IOS.leaf_delete_count
, IOS.leaf_update_count
, IOS.nonleaf_insert_count
, IOS.nonleaf_delete_count
, IOS.nonleaf_update_count
FROM sys.indexes IND
     CROSS APPLY sys.dm_db_index_operational_stats(DB_ID(), IND.object_id, IND.index_id, NULL) IOS
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product')
 AND IOS.range_scan_count<>0
ORDER BY IND.name;

Output:

index_name	type_desc	range_scan_count	singleton_lookup_count	leaf_insert_count	leaf_delete_count	leaf_update_count	nonleaf_insert_count	nonleaf_delete_count	nonleaf_update_count
IX_SalesLT_Product_Color_Size	NONCLUSTERED	11	0	0	0	0	0	0	0
IX_SalesLT_Product_ListPrice_IC	NONCLUSTERED	8	0	0	0	0	0	0	0
PK_Product_ProductID	CLUSTERED	10	64	0	0	0	0	0	0

For more information on these DMVs check the documentation.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: sys.dm_db_index_physical_stats [link]
[2] Microsoft Learn (2024) SQL Server: sys.dm_db_index_usage_stats [link]
[3] Microsoft Learn (2024) SQL Server: sys.dm_db_index_operational_stats [link]

04 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part V: Manual Index Maintenance) 🆕

Indexes' maintenance in Microsoft Fabric's SQL databases is supposed to happen automatically in the background via automatic tuning options feature, though the whole functionality is still in its early phases, and therefore many questions regarding the whole process may arise. Probably the most important question is whether indexes can still be created, respectively maintained manually. That's useful for temporary or even periodic workloads, where maybe organizations might still want to maintain indexes manually.

The tests made below are based on the SalesLT.Product from AdventureWorkds database available in Microsoft Fabric. The target was to create several indexes that could be used for the various testing purposes. Each set of the below scripts was run 5-10 times until records appeared in the sys.dm_db_missing_index_details table for each test case (see further below):

-- batch 1: filter on single column (to be run 5-10 times)
SELECT *
FROM SalesLT.Product 
WHERE Color = 'Red'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'Black'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'White'

-- batch 2: filter on two columns (to be run 5-10 times)
SELECT *
FROM SalesLT.Product 
WHERE Color = 'Red'
  AND Size = '58'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'Black'
  AND Size = '58'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'White'
     AND Size = '58'

-- batch 3: filter with column selection (to be run 5-10 times)
SELECT ProductNumber, Name, Color, ListPrice
FROM SalesLT.Product 
WHERE ListPrice BETWEEN 50 AND 55

SELECT ProductNumber, Name, Color, ListPrice
FROM SalesLT.Product 
WHERE ListPrice BETWEEN 100 and 105

Once the scripts run, one can look at the records created in the above considered dynamic management view:

-- sys metadata -  missing indexes
SELECT MID.statement AS table_name
, MID.equality_columns
, MID.inequality_columns
, MID.included_columns
--, MIG.index_group_handle
--, MIG.index_handle
FROM sys.dm_db_missing_index_details MID 
    JOIN sys.dm_db_missing_index_groups MIG 
     ON MID.index_handle =  MIG.index_handle
ORDER BY MIG.index_group_handle
, MIG.index_handle

Output:

table_name	equality_columns	inequality_columns	included_columns
[AdventureWorks01-...].[SalesLT].[Product]	[Color]
[AdventureWorks01-...].[SalesLT].[Product]	[Color], [Size]
[AdventureWorks01-...].[SalesLT].[Product]		[ListPrice]	[Name], [ProductNumber], [Color]

The next step is to create one of the indexes (please note that database's name must be replaced accordingly or used only the 2-part naming convention - schema & table name ):

-- create index on Color
CREATE INDEX IX_SalesLT_Product_Color 
ON [AdventureWorks01-...].[SalesLT].[Product] (Color);

Once the script was run, all the records related to the SalesLT.Product disappeared from the dynamic management view. Therefore, it might be a good idea to take a snapshot with view's data before creating any indexes manually. Probably the same behavior should be expected when the indexes are created by the system.

-- create index on Color & Size
CREATE INDEX IX_SalesLT_Product_Color_Size
ON [SalesLT].[Product] (Color, Size);

-- create index on ListPrice with included columns
CREATE INDEX IX_SalesLT_Product_ListPrice_IC
ON [SalesLT].[Product] (ListPrice) INCLUDE(ProductNumber, Name, Color);

One can use the following query based on the meta.vIndexes (created in a previous post) to look at the indexes created:

-- sys metadata - index columns
SELECt IND.db_name
, IND.schema_name
, IND.table_name
, IND.index_name
, IND.index_type
, IND.principal_type
, IND.auto_created
FROM meta.vIndexes IND
WHERE IND.schema_name = 'SalesLT'
  AND IND.table_name = 'Product'
  AND IND.index_name IN ('IX_SalesLT_Product_Color ','IX_SalesLT_Product_Color_Size'
,'IX_SalesLT_Product_ListPrice_IC')
ORDER BY IND.table_name
, IND.index_name

Output:

db_name	schema_name	table_name	index_name	index_type	principal_type	auto_created
AdventureWorks01-...	SalesLT	Product	IX_SalesLT_Product_Color	NONCLUSTERED	S	False
AdventureWorks01-...	SalesLT	Product	IX_SalesLT_Product_Color_Size	NONCLUSTERED	S	False
AdventureWorks01-...	SalesLT	Product	IX_SalesLT_Product_ListPrice_IC	NONCLUSTERED	S	False

After this model can be created further indexes as needed. It's always a good idea to take a "copy" of the indexes created (or keep a history of the scripts run for indexes' maintenance). This best practice is now more important, when the system can drop indexes as it considers fit.

Don't forget to clean up the changes made if the indexes aren't needed anymore:

-- cleaning after
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_Color;
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_Color_Size;
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_ListPrice_IC;

So, after these tests, the standard syntax for index's maintenance seems to work also on SQL databases, with all the implications deriving from this (e.g. porting of scripts, database objects, etc.)

Happy coding!

Previous Post <<||>> Next Post

SQL Troubles

Pages

11 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 18: How Fast Can AI Surpass Human Intelligence?)

17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

🏭🗒️Microsoft Fabric: V-Order [Notes]

15 March 2025

💫🗒️ERP Systems: Microsoft Dynamics 365's Business Performance Analytics (BPA) [notes]

14 February 2025

🏭🧊🗒️Microsoft Fabric: Partitions in Lakehouses [Notes]

13 February 2025

🧊💠🗒️Data Warehousing: Table Partitioning in SQL Server [Notes]

🏭💠🗒️Microsoft Fabric: SQL Analytics Endpoint [Notes]

06 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VII: Things That Don't Work) 🆕

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VI: Index Usage Analysis)🆕

04 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part V: Manual Index Maintenance) 🆕

About Me