SQL Troubles: databases

Showing posts with label databases. Show all posts

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 21-Jun-2025

[Microsoft Fabric] Result Set Caching in SQL Analytics Endpoints

{def} built-in performance optimization for Warehouse and Lakehouse that improves read latency [1]

fully transparent to the user [3]
persists the final result sets for applicable SELECT T-SQL queries

caches all the data accessed by a query [3]
subsequent runs that "hit" cache will process just the final result set

can bypass complex compilation and data processing of the original query[1]

⇐ returns subsequent queries faster [1]

the cache creation and reuse is applied opportunistically for queries

works on

warehouse tables
shortcuts to OneLake sources
shortcuts to non-Azure sources

the management of cache is handled automatically [1]

regularly evicts cache as needed

as data changes, result consistency is ensured by invalidating cache created earlier [1]

{operation} enable setting

via ALTER DATABASE <database_name> SET RESULT_SET_CACHING ON

{operation} validate setting

via SELECT name, is_result_set_caching_on FROM sys.databases

{operation} configure setting

configurable at item level

once enabled, it can then be disabled

at the item level
for individual queries

e.g. debugging or A/B testing a query

via OPTION ( USE HINT ('DISABLE_RESULT_SET_CACHE')

{default} during the preview, result set caching is off for all items [1]

[monitoring]

via Message Output

applicable to Fabric Query editor, SSMS
the statement "Result set cache was used" is displayed after query execution if the query was able to use an existing result set cache

via queryinsights.exec_requests_history system view

result_cache_hit displays indicates result set cache usage for each query execution [1]

{value} 2: the query used result set cache (cache hit)
{value} 1: the query created result set cache
{value} 0: the query wasn't applicable for result set cache creation or usage [1]

{reason} the cache no longer exists
{reason} the cache was invalidated by a data change, disqualifying it for reuse [1]
{reason} query isn't deterministic

isn't eligible for cache creation [1]

{reason} query isn't a SELECT statement

[warehousing]

{scenario} analytical queries that process large amounts of data to produce a relatively small result [1]
{scenario} workloads that trigger the same analytical queries repeatedly [1]

the same heavy computation can be triggered multiple times, even though the final result remains the same [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Result set caching (preview) [link]

[2] Microsoft Fabric Update Blog (2025) Result Set Caching for Microsoft Fabric Data Warehouse (Preview) [link|aka]

[3] Microsoft Learn (2025) In-memory and disk caching [link]

[4] Microsoft Learn (2025) Performance guidelines in Fabric Data Warehouse [link]

Resources:
[R1] Microsoft Fabric (2025) Fabric Update - June 2025 [link]

Acronyms:

MF - Microsoft Fabric

SSMS - SQL Server Management Studio

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

{def} system database available as a global resource to all users connected to a Database Engine instance [1]

does not persist after SQL Server shuts down

created at restart using the specifications from the model database
doesn’t need crash recovery
requires rollback capabilities

{scenario} when updating a row in a global temporary table and rolling back the transaction [12]

there's no way to undo this change unless was logged the ‘before value’ of the update [12]
there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]

{scenario} during an Insert operation

only redo information is needed
when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]

only redo information is needed

not all objects in TempDB are subject to logging [18]

{characteristic} critical database

⇒ needs to be configured adequately
has the highest level of create and drop actions [5]
under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
workers use tempdb like any other database

any worker can issue I/O to and from tempdb as needed [5]

if the tempdb database cannot be created, SQL Server will not start [7]

{recommendation} start SQL Server by using the (-f) startup parameter

files

data file:

{default} dempdb.mdf

log file

{default} templog.ldf

{operation} changing the tempdb database

{step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
{step} stop and restart SQL Server

{operation} create file group

only one file group for data and one file group for logs are allowed.
{default} the number of files is set to 1
can be a high contention point for internal tracking structures

often it's better to create multiple files for tempdb

multiple files can be created for each file group

{recommendation} set the number of files to match the number of CPUs that are configured for the instance

⇐ if the number of logical processors is less than or equal to eight [1]

otherwise, use eight data files [1]
contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]

⇐ it’s not imperative

{scenario}add a file per CPU (aka multiple files)

allows the file group manager to

pick the next file

there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]

then use the target location to start locating free space [4]

⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)

allows each of the logical schedulers to loosely align with a file [4]

since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention [4]

internal latching and other activities achieve better separation (per file)

⇒ the workers don’t cause a resource contention on a single tempdb file

rare type of contention [5]

{misconception}

leads to new I/O threads per file [5]

{scenario} add more data files

may help to solve potential performance problems that are due to I/O operations.
helps to avoid a latch contention on allocation pages

⇐ manifested as a UP-latch)

{downside} having too many files

increases the cost of file switching
requires more IAM pages
increases the manageability overhead.

{recommendation} configure the size of the files to be equal

⇐ to better use the allocation mechanism (proportional fill),

{recommendation} create tempdb files striped across fast disks

the size of the tempdb affect the performance of the database
{event} server restarts

the tempdb file size is reset to the configured value (the default is 8 MB).

{feature} auto grow

temporary

⇐ unlike other types of databases)

when the tempdb grows, all transactional activity may come to an halt

because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
{recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]

{recommendation} should only be used more for exceptions rather than as a strategy [12]

transactions

lose the durability attribute
can be rolled back

there is no need to REDO them because the contents of tempdb do not persist across server restarts

because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions

most internal operations on tempdb do not generate log records because there is no need to roll back

{restriction} add filegroups [1]
{restriction} remove the primary filegroup, primary data file, or log file [1]
{restriction} renaming the database or primary filegroup [1]
{restriction} back up or restore the database [1]
{restriction} change collation [1]

the default collation is the server collation [1]

{restriction} change the database owner [1]

owned by sa

{restriction} create a database snapshot [1]
{restriction} drop the database [1]
{restriction} drop the quest user from the database [1]
{restriction} enable CDC
{restriction} participate in database mirroring [1]
{restriction} can only be configured in the simple recovery model

⇒ the value can’t be changed

{restriction} auto shrink is not allowed for tempdb

database shrink and file shrink capabilities are limited

⇐ because many of the hidden objects stored in tempdb cannot be moved by shrink operations

{restriction} the database CHECKSUM option cannot be enabled
{restriction} database snapshots cannot be created
{restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.

only offline checking for DBCC CHECKTABLE is performed

⇒ TAB-S lock is needed
here are internal consistency checks that occur when tempdb is in use

if the checks fail, the user connection is broken and the tempdb space used by the connection is freed.

{restriction} set the database to OFFLINE
{restriction} set the database or primary filegroup to READ_ONLY [1]

used by

{feature} query
{feature} temporary tables
{feature} table variables
{feature} table-valued functions
{feature} user-defined functions
{feature} online/offline index creation
{feature} triggers
{feature} cursors
{feature} RCSI
{feature} MARS
{feature} DBCC CHECK
{feature} LOB parameters
{feature} Service Broker and event notification
{feature} XML and LOB variable
{feature} query notifications
{feature} database mail

used to store

user objects

objects explicitly created

user objects that can be created in a user database can also be created in tempdb [1]

{limitation} there's no durability guarantee [1]
{limitation} dropped when the Database Engine instance restarts [1]

{type} global and local temporary tables

correspond to ## and # tables and table variables created explicitly by the application [12]

REDO information is not logged

{type} indexes on global and local temporary tables
{type} table variables
{type} tables returned in table-valued functions
{type} cursors
improved caching for temporary objects

temp objects

only cached when none of the following conditions is violated: [12]

named constraints are not created

DDL statements that affect the table are not run after the temp table has been created

e.g. CREATE INDEX or CREATE STATISTICS statements

not created by using dynamic SQ

e.g. sp_executesql N'create table #t(a int)'

internal objects

objects created internally by SQL Server

each object uses a minimum of nine pages [1]

an IAM page and an eight-page extent [1]

{type} work tables

store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]

{type} work files

used for for hash join or hash aggregate operations [1]

{type} intermediate sort results

used for operations

creating or rebuilding indexes

if SORT_IN_TEMPDB is specified

certain GROUP BY, ORDER BY
UNION queries

{restriction} applications cannot directly insert into or delete rows from them
{restriction} metadata is stored in memory
{restriction} metadata does not appear in system catalog views such as sys.all_objects.
{restriction} considered to be hidden objects
{restriction} updates to them do not generate log records
{restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated.
each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
used

to store intermediate runs for sort
to store intermediate results for hash joins and hash aggregates.
to store XML variables or other LOB data type variables

e.g. text, image, ntext, varchar(max), varbinary(max), and all others

by queries that need a spool to store intermediate results
by keyset cursors to store the keys
by static cursors to store a query result
by Service Broker to store messages in transit
by INSTEAD OF triggers to store data for internal processing
by any feature that uses the above mentioned operations

version stores

collections of data pages that hold the data rows that support row versioning [1]
{type} common version store
{type} online index build version store

{} storage

I/O characteristics

{requirement} read after rights

{def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]

{recommendation} writer ordering

{def} the ability of the subsystem to maintain the correct order of write operations [7]

{recommendation} torn I/O prevention

{def} the ability of the system to avoid splitting individual I/O requests [7]

{requirement} physical sector alignment and size

devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]

can be put on specialty systems

⇐ they can’t be used for other databases
RAM disks

non-durable media
support double RAM cache

one in the buffer pool and one on the RAM disk

⇐ directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]

give up RAM

implementations of RAM disks and RAM-based files caches

solid state

high speed subsystems
{recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]

sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]

{feature} logging optimization

avoids logging the "after value" in certain log records in tempdb

when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file

can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device

{feature} instant data file initialization

isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
{benefit} minimizes overhead significantly when tempdb needs to auto grow.

without this, auto grow could take a long time and lead to application timeout.

reduces the impact of database creation [5]

because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]

reduces the gain from using multiple threads during database creation. [5]

{feature} proportional fill optimization

reduces UP latch contention in tempdb
when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time

⇐ accomplished by removing a latch that was taken during proportional fill.

{feature} deferred drop in tempdb

when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait

⇒ faster response time to applications.

{feature} worktable caching

{improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated

⇐ in addition, the first nine pages for the work table are kept.

{feature} caches temporary objects

when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming

⇐ this can cause contentions on tempdb system catalog tables and allocation pages

Previous Post <<||>> Next Post

References

[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005

[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]

[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]

[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]

[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]

[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]

[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output

LOB - large object
MARS - Multiple Active Result Sets

NTFS - New Technology File System

RAM - Random-Access Memory

RCSI - Read Committed Snapshot Isolation

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented.

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

{def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities

designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
based on a robust DCP

designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
a highly-available micro-service architecture with well-defined responsibilities [2]

data and query processing is packaged into units (aka tasks)

can be readily moved across compute nodes and re-started at the task level

widely-partitioned data with a flexible distribution model [2]
a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
a framework for fine-grained monitoring and flexible scheduling of tasks [2]

{component} SQL Server Front End (SQL-FE)

responsible for

compilation
authorization
authentication
metadata

used by the compiler to

{operation} generate the search space (aka MEMO) for incoming queries
{operation} bind metadata to data cells
leveraged to ensure the durability of the transaction manifests at commit [2]

only transactions that successfully commit need to be actively tracked to ensure consistency [2]
any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]

{component} SQL Server Backend (SQL-BE)

used to perform write operations on the LST [2]

inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]

SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes [2]
each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]

the serialized information is then uploaded as a block to the manifest file
uploading the block does not yet make any visible changes to the file [2]

each block is identified by a unique ID generated on the writing SQL BE [2]

after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]

the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]

the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]

at this point, the changes to the file on storage will become effective [2]

changes to the manifest file are not visible until the Commit operation on the SQL FE

the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]

the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]

[read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE

the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]

queries over these are processed by the SQL Server query execution engine [2]
the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]

enables the cache to be used by different operations operating on different snapshots of the table [2]
enables the cache to be incrementally updated as new transactions commit [2]

{feature} supports explicit user transactions

can execute multiple statements within the same transaction in a consistent way

the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]

changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]

[multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
{write operations} the behavior of the SQL BE depends on the type of the operation.

insert operations

only add new data and have no dependency on previous changes [2]
the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks

⇐ effectively appends the data to the manifest file [2]

{update|delete operations}

handled differently

⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]

e.g. an update operation can be followed by another update operation touching the same rows

the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]

SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]

the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]

{concept} Distributed Query Processor (DQP)

responsible for

distributed query optimization
distributed query execution
query execution topology management

{concept} Workload Management (WLM)

consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]

each compute server runs two micro-services

{service} Execution Service (ES)

responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]

{service} SQL Server instance

used as the back-bone for execution of the template query for a given task [2]

⇐ holds a cache on top of local SSDs

in addition to in-memory caching of hot data

data can be transferred from one compute server to another

via dedicated data channels

the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]

{concept} cell data abstraction

the key building block that enables to abstract data stores

abstracts DQP from the underlying store [1]
any dataset can be mapped to a collection of cells [1]
allows distributing query processing over data in diverse formats [1]
tailored for vectorized processing when the data is stored in columnar formats [1]
further improves relational query performance

2-dimenstional

distributions (data alignment)
partitions (data pruning)

each cell is self-contained with its own statistics [1]

used for both global and local QO [1]
cells can be grouped physically in storage [1]
queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]

{concept} distributed query processing (DQP) framework

operates at the cell level
agnostic to the details of the data within a cell

data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]

{concept} dataset

logically abstracted as a collection of cells [1]
can be arbitrarily assigned to compute nodes to achieve parallelism [1]
uniformly distributed across a large number of cells

[scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
such that they can be processed in parallel across nodes

{concept} session

supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
all data are accessible from any session [1]

multiple sessions can access all underlying data concurrently [1]

{concept} Physical Metadata layer

new layer introduced in the SQL Server storage engine [2]

16 March 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XI: Database and Server Properties)

When taking over a SQL Server, respectively database, one of the first checks I do focuses on the overall configuration, going through the UI available for admins to see if I can find anything that requires further investigation. If no documentation is available on the same, I run a few scripts and export their output as baseline.

Especially when documenting the configuration, it's useful to export the database options and properties defined at database level. Besides the collation and probably the recovery mode, typically the rest of the configuration is similar, though in exceptional cases one should expect also surprises that require further investigation!

The following query retrieves in a consolidated way all the options and properties of a SQL database in Microsoft Fabric.

-- database settings/properties 
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') Collation
--, DATABASEPROPERTYEX(DB_NAME(), 'ComparisonStyle')  ComparisonStyle
, DATABASEPROPERTYEX(DB_NAME(), 'Edition') Edition
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullDefault') IsAnsiNullDefault
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullsEnabled') IsAnsiNullsEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiPaddingEnabled') IsAnsiPaddingEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiWarningsEnabled') IsAnsiWarningsEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsArithmeticAbortEnabled') IsArithmeticAbortEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoClose') IsAutoClose
, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatistics') IsAutoCreateStatistics
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatisticsIncremental') IsAutoCreateStatisticsIncremental
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoShrink') IsAutoShrink
, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoUpdateStatistics') IsAutoUpdateStatistics
--, DATABASEPROPERTYEX(DB_NAME(), 'IsClone') IsClone
--, DATABASEPROPERTYEX(DB_NAME(), 'IsCloseCursorsOnCommitEnabled') IsCloseCursorsOnCommitEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsDatabaseSuspendedForSnapshotBackup') IsDatabaseSuspendedForSnapshotBackup
, DATABASEPROPERTYEX(DB_NAME(), 'IsFulltextEnabled') IsFulltextEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsInStandBy') IsInStandBy
--, DATABASEPROPERTYEX(DB_NAME(), 'IsLocalCursorsDefault') IsLocalCursorsDefault
--, DATABASEPROPERTYEX(DB_NAME(), 'IsMemoryOptimizedElevateToSnapshotEnabled') IsMemoryOptimizedElevateToSnapshotEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsMergePublished') IsMergePublished
--, DATABASEPROPERTYEX(DB_NAME(), 'IsNullConcat') IsNullConcat
--, DATABASEPROPERTYEX(DB_NAME(), 'IsNumericRoundAbortEnabled') IsNumericRoundAbortEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsParameterizationForced') IsParameterizationForced
--, DATABASEPROPERTYEX(DB_NAME(), 'IsQuotedIdentifiersEnabled') IsQuotedIdentifiersEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsPublished') IsPublished
--, DATABASEPROPERTYEX(DB_NAME(), 'IsRecursiveTriggersEnabled') IsRecursiveTriggersEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsSubscribed') IsSubscribed
--, DATABASEPROPERTYEX(DB_NAME(), 'IsSyncWithBackup') IsSyncWithBackup
--, DATABASEPROPERTYEX(DB_NAME(), 'IsTornPageDetectionEnabled') IsTornPageDetectionEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsVerifiedClone') IsVerifiedClone
--, DATABASEPROPERTYEX(DB_NAME(), 'IsXTPSupported') IsXTPSupported
, DATABASEPROPERTYEX(DB_NAME(), 'LastGoodCheckDbTime') LastGoodCheckDbTime
, DATABASEPROPERTYEX(DB_NAME(), 'LCID') LCID
--, DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') MaxSizeInBytes
, DATABASEPROPERTYEX(DB_NAME(), 'Recovery') Recovery
--, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjective') ServiceObjective
--, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjectiveId') ServiceObjectiveId
, DATABASEPROPERTYEX(DB_NAME(), 'SQLSortOrder') SQLSortOrder
, DATABASEPROPERTYEX(DB_NAME(), 'Status') Status
, DATABASEPROPERTYEX(DB_NAME(), 'Updateability') Updateability
, DATABASEPROPERTYEX(DB_NAME(), 'UserAccess') UserAccess
, DATABASEPROPERTYEX(DB_NAME(), 'Version') Version
--, DATABASEPROPERTYEX(DB_NAME(), 'ReplicaID') ReplicaID

Output:

Collation

Edition

IsAutoCreateStatistics

IsAutoUpdateStatistics

IsFulltextEnabled

LastGoodCheckDbTime

LCID

Recovery

SQLSortOrder

Status

Updateability

UserAccess

Version

SQL_Latin1_General_CP1_CI_AS

FabricSQLDB

12/31/1899

1033

FULL

ONLINE

READ_WRITE

MULTI_USER

981

The query can be run also against the SQL analytics endpoints available for warehouses in Microsoft Fabric.

Output:

Collation

Edition

IsAutoCreateStatistics

IsAutoUpdateStatistics

IsFulltextEnabled

LastGoodCheckDbTime

LCID

Recovery

SQLSortOrder

Status

Updateability

UserAccess

Version

Latin1_General_100_BIN2_UTF8

DataWarehouse

12/31/1899

1033

SIMPLE

ONLINE

READ_WRITE

MULTI_USER

987

Respectively, for lakehouses:

Collation

Edition

IsAutoCreateStatistics

IsAutoUpdateStatistics

IsFulltextEnabled

LastGoodCheckDbTime

LCID

Recovery

SQLSortOrder

Status

Updateability

UserAccess

Version

Latin1_General_100_BIN2_UTF8

LakeWarehouse

12/31/1899

1033

SIMPLE

ONLINE

READ_WRITE

MULTI_USER

987

A similar output is obtained if one runs the query against SQL database's SQL analytics endpoint:

Output:

Collation

Edition

IsAutoCreateStatistics

IsAutoUpdateStatistics

IsFulltextEnabled

LastGoodCheckDbTime

LCID

Recovery

SQLSortOrder

Status

Updateability

UserAccess

Version

Latin1_General_100_BIN2_UTF8

LakeWarehouse

12/31/1899

1033

SIMPLE

ONLINE

READ_WRITE

MULTI_USER

987

SQL databases seem to inherit the collation from the earlier versions of SQL Server.

Another meaningful value for SQL databases is MaxSizeInBytes, which in my environment had a value of 3298534883328 bytes ÷ 1,073,741,824 = 3,072 GB.

There are however also server properties. Here's the consolidated overview:

-- server properties
SELECT --SERVERPROPERTY('BuildClrVersion') BuildClrVersion
 SERVERPROPERTY('Collation') Collation
--, SERVERPROPERTY('CollationID') CollationID
, SERVERPROPERTY('ComparisonStyle') ComparisonStyle
--, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') ComputerNamePhysicalNetBIOS
, SERVERPROPERTY('Edition') Edition
--, SERVERPROPERTY('EditionID') EditionID
, SERVERPROPERTY('EngineEdition') EngineEdition
--, SERVERPROPERTY('FilestreamConfiguredLevel') FilestreamConfiguredLevel
--, SERVERPROPERTY('FilestreamEffectiveLevel') FilestreamEffectiveLevel
--, SERVERPROPERTY('FilestreamShareName') FilestreamShareName
--, SERVERPROPERTY('HadrManagerStatus') HadrManagerStatus
--, SERVERPROPERTY('InstanceDefaultBackupPath') InstanceDefaultBackupPath
, SERVERPROPERTY('InstanceDefaultDataPath') InstanceDefaultDataPath
--, SERVERPROPERTY('InstanceDefaultLogPath') InstanceDefaultLogPath
--, SERVERPROPERTY('InstanceName') InstanceName
, SERVERPROPERTY('IsAdvancedAnalyticsInstalled') IsAdvancedAnalyticsInstalled
--, SERVERPROPERTY('IsBigDataCluster') IsBigDataCluster
--, SERVERPROPERTY('IsClustered') IsClustered
, SERVERPROPERTY('IsExternalAuthenticationOnly') IsExternalAuthenticationOnly
, SERVERPROPERTY('IsExternalGovernanceEnabled') IsExternalGovernanceEnabled
, SERVERPROPERTY('IsFullTextInstalled') IsFullTextInstalled
--, SERVERPROPERTY('IsHadrEnabled') IsHadrEnabled
--, SERVERPROPERTY('IsIntegratedSecurityOnly') IsIntegratedSecurityOnly
--, SERVERPROPERTY('IsLocalDB') IsLocalDB
--, SERVERPROPERTY('IsPolyBaseInstalled') IsPolyBaseInstalled
--, SERVERPROPERTY('IsServerSuspendedForSnapshotBackup') IsServerSuspendedForSnapshotBackup
--, SERVERPROPERTY('IsSingleUser') IsSingleUser
--, SERVERPROPERTY('IsTempDbMetadataMemoryOptimized') IsTempDbMetadataMemoryOptimized
, SERVERPROPERTY('IsXTPSupported') IsXTPSupported
, SERVERPROPERTY('LCID') LCID
, SERVERPROPERTY('LicenseType') LicenseType
, SERVERPROPERTY('MachineName') MachineName
, SERVERPROPERTY('NumLicenses') NumLicenses
, SERVERPROPERTY('PathSeparator') PathSeparator
--, SERVERPROPERTY('ProcessID') ProcessID
, SERVERPROPERTY('ProductBuild') ProductBuild
--, SERVERPROPERTY('ProductBuildType') ProductBuildType
--, SERVERPROPERTY('ProductLevel') ProductLevel
--, SERVERPROPERTY('ProductMajorVersion') ProductMajorVersion
--, SERVERPROPERTY('ProductMinorVersion') ProductMinorVersion
--, SERVERPROPERTY('ProductUpdateLevel') ProductUpdateLevel
--, SERVERPROPERTY('ProductUpdateReference') ProductUpdateReference
--, SERVERPROPERTY('ProductUpdateType') ProductUpdateType
, SERVERPROPERTY('ProductVersion') ProductVersion
, SERVERPROPERTY('ResourceLastUpdateDateTime') ResourceLastUpdateDateTime
, SERVERPROPERTY('ResourceVersion') ResourceVersion
, SERVERPROPERTY('ServerName') ServerName
, SERVERPROPERTY('SqlCharSet') SqlCharSet
, SERVERPROPERTY('SqlCharSetName') SqlCharSetName
, SERVERPROPERTY('SqlSortOrder') SqlSortOrder
, SERVERPROPERTY('SqlSortOrderName') SqlSortOrderName
, SERVERPROPERTY('SuspendedDatabaseCount') SuspendedDatabaseCount

Output (consolidated):

Property	SQL database	Warehouse	Lakehouse
Collation	SQL_Latin1_General_CP1_CI_AS	SQL_Latin1_General_CP1_CI_AS	SQL_Latin1_General_CP1_CI_AS
ComparisonStyle	196609	196609	196609
Edition	SQL Azure	SQL Azure	SQL Azure
EngineEdition	12	11	11
InstanceDefaultDataPath	NULL	NULL	NULL
IsAdvancedAnalyticsInstalled	1	1	1
IsExternalAuthenticationOnly	1	0	0
IsExternalGovernanceEnabled	1	1	1
IsFullTextInstalled	1	0	0
IsXTPSupported	1	1	1
LCID	1033	1033	1033
LicenseType	DISABLED	DISABLED	DISABLED
MachineName	NULL	NULL	NULL
NumLicenses	NULL	NULL	NULL
PathSeparator	\	\	\
ProductBuild	2000	502	502
ProductVersion	12.0.2000.8	12.0.2000.8	12.0.2000.8
ResourceLastUpdateDateTime	11/6/2024 3:41:27 PM	3/5/2025 12:05:50 PM	3/5/2025 12:05:50 PM
ResourceVersion	16.00.5751	17.00.502	17.00.502
ServerName	...	....	....
SqlCharSet	1	1	1
SqlCharSetName	iso_1	iso_1	iso_1
SqlSortOrder	52	52	52
SqlSortOrderName	nocase_iso	nocase_iso	nocase_iso
SuspendedDatabaseCount	NULL	0	0

It's interesting that all three instances have the same general collation, while the Engine Edition of SQL databases is not compatible with the others [2]. The Server Names has been removed manually from the output from obvious reasons. The warehouse and lakehouse are in the same environment (SQL Azure instance, see sys.databases), and therefore the same values are shown (though this might happen independently of the environments used).

The queries were run in a trial Microsoft Fabric environment. Other environments can have upon case different properties. Just replace the "--" from the commented code to get a complete overview.

The queries should run also in the other editions of SQL Server. If DATABASEPROPERTYEX is not supported, one should try DATABASEPROPERTY instead.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server 2022: DATABASEPROPERTYEX (Transact-SQL) [link]
[2] Microsoft Learn (2024) SQL Server 2022: SERVERPROPERTY (Transact-SQL) [link]

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

[def]
a service that empowers users to extract insights and visualize data in motion

offers an end-to-end solution for

event-driven scenarios

⇐ rather than schedule-driven solutions [1]

a workspace of databases

can be shared across projects [1]

allows to manage multiple databases at once

sharing capacity and resources to optimize performance and cost
provides unified monitoring and management across all databases and per database [1]

provide a solution for handling and analyzing large volumes of data

particularly in scenarios requiring real-time analytics and exploration [1]
designed to handle real-time data streams efficiently [1]

lets organizations ingest, process, and analyze data in near real-time [1]

provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.

preferred engine for semistructured and free text analysis
tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
allows to get data

from multiple sources,
in multiple pipelines

e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.

multiple data formats [1]

data is automatically indexed and partitioned based on ingestion time

designed to optimize cost by suspending the service when not in use [1]

reactivating the service, can lead to a latency of a few seconds [1]

for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1]

enables the service to be always available at a selected minimum level [1]

customers pay for

the minimum compute level selected [1]
the actual consumption when the compute level is above the minimum set [1]

the specified compute is available to all the databases within the eventhouse [1]

{scenario} solutions that includes event-based data

e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]

KQL databases

can be created within an eventhouse [1]
can either be a standard database, or a database shortcut [1]
an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
data availability in OneLake can be enabled on a database or table level [1]

Eventhouse page

serves as the central hub for all your interactions within the Eventhouse environment [1]
Eventhouse ribbon

provides quick access to essential actions within the Eventhouse

explorer pane

provides an intuitive interface for navigating between Eventhouse views and working with databases [1]

main view area

displays the system overview details for the eventhouse [1]

{feature} Eventhouse monitoring

offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]

can be used to optimize the eventhouse and improve the user experience [2]

{feature} query logs table

contains the list of queries run on an Eventhouse KQL database

for each query, a log event record is stored in the EventhouseQueryLogs table [3]

can be used to

analyze query performance and trends [3]
troubleshoot slow queries [3]
identify heavy queries consuming large amount of system resources [3]
identify the users/applications running the highest number of queries[3]

{feature} OneLake availability

{benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]

users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]

e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.

{prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
{prerequisite} a KQL database with editing permissions and data [4]
{constraint} rename tables
{constraint} alter table schemas
{constraint} apply RLS to tables
{constraint} data can't be deleted, truncated, or purged
when turned on, a mirroring policy is enabled

can be used to monitor data latency or alter it to partition delta tables [4]

{feature} robust adaptive mechanism

intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
⇐ important when dealing with trickling data [4]

⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]

delays write operations if there isn't enough data to create optimal Parquet files [4]

ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
{default} the write operation can take up to 3 hours or until files of sufficient size are created [4]

typically the files have 200-256 MB
the value can be adjusted between 5 minutes and 3 hours [4]

{warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]

can lead to inefficient query performance [4]

{restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]

delta tables can be partitioned to improve query speed [4]

each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]

⇒ OneLake copy has more columns than the source table [4]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link]

[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]

[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]

[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security
RTI - Real-Time Intelligence

25 February 2025

🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 25-Feb-2024

[Microsoft Fabric] T-SQL notebook

{def} notebook that enables to write and run T-SQL code within a notebook [1]
{feature} allows to manage complex queries and write better markdown documentation [1]
{feature} allows the direct execution of T-SQL on

connected warehouse
SQL analytics endpoint
⇐ queries can be run directly on the connected endpoint [1]

multiple connections are allowed [1]

allows running cross-database queries to gather data from multiple warehouses and SQL analytics endpoints [1]
the code is run by the primary warehouse

used as default in commands which supports three-part naming, though no warehouse was provided [1]

three-part naming consists of

database name

the name of the warehouse or SQL analytics endpoint [1]

schema name
table name

{feature} autogenerate T-SQL code using the code template from the object explorer's context [1] menu
{concept} code cells

allow to create and run T-SQL code

each code cell is executed in a separate session [1]

{limitation} the variables defined in one cell are not available in another cell [1]
one can check the execution summary after the code is executed [1]

cells can be run individually or together [1]
one cell can contain multiple lines of code [1]

users can select and run subparts of a cell’s code [1]

{feature} Table tab

lists the records from the returned result set

if the execution contains multiple result set, you can switch from one to another via the dropdown menu [1]

a query can be saved as

view

via 'Save as' view
{limitation} does not support three-part naming [1]

the view is always created in the primary warehouse [1]

by setting the warehouse as the primary warehouse [1]

table

via 'Save as' table
saved as CTAS

⇐ 'Save as' is only available for the selected query text

the query text must be selected before using the Save as options

{limitation} doesn’t support

parameter cell

the parameter passed from pipeline or scheduler can't be used [1]

{feature} Recent Run

{workaround} use the current data warehouse monitoring feature to check the execution history of the T-SQL notebook [1]

{feature} the monitor URL inside the pipeline execution
{feature} snapshot
{feature} Git support
{feature} deployment pipeline support

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) T-SQL support in Microsoft Fabric notebooks [link]

[2] Microsoft Learn (2025) Create and run a SQL Server notebook [link]

[3] Microsoft Learn (2025) T-SQL surface area in Microsoft Fabric [link]

[4] Microsoft Fabric Updates Blog (2024) Announcing Public Preview of T-SQL Notebook in Fabric [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms
CTAS - Create Table as Select
T-SQL - Transact SQL

06 February 2025

🌌🏭KQL Reloaded: First Steps (Part V: Database Metadata)

When working with a new data repository, one of the first things to do is to look at database's metadata, when available, and try to get a birds eye view of what's available, how big is the databases in terms of size, tables and user-defined objects, how the schema was defined, how the data are stored, eventually how often backup are taken, what users have access and to what, etc.

So, after creating some queries in KQL and figuring out how things work, I tried to check what metadata are available, how it can be accessed, etc. The target is not to provide a full list of the available metadata, but to understand what information is available, in what format, how easy is to extract the important metadata, etc.

So, the first set of metadata is related to database:

// get database metadata metadata
.show databases (ContosoSales)

// get database metadata metadata (multiple databases)
.show databases (ContosoSales, Samples)

// get database schema metadata
.show databases (ContosoSales) schema

// get database schema metadata (multiple databases) 
.show databases (ContosoSales, Samples) schema

// get database schema violations metadata
.show database ContosoSales schema violations

// get database entities metadata
.show databases entities with (showObfuscatedStrings=true)
| where DatabaseName == "ContosoSales"

// get database metadata 
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" and EntityType == "Table"
//| summarize count () //get the number of tables

// get a function's details
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" 
    and EntityType == "Function" 
    and EntityName == "SalesWithParams"

// get external tables metadata
.show external tables

// get materialized views metadata
.show materialized-views

// get query results metadata
.show stored_query_results

// get entities groups metadata
.show entity_groups

Then, it's useful to look at the database objects.

// get all tables 
.show tables 
//| count

// get tables metadata
.show tables (Customers, NewSales)

// get tables schema
.show table Customers cslschema

// get schema as json
.show table Customers schema as json

// get table size: Customers
Customers
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize_MB=round(sum(sizeEstimateOfColumn)/1024.00/1024.00,2)

Unfortunately, the public environment has restrictions in what concerns the creation of objects, while for the features available one needs to create some objects to query the corresponding metadata.

Furthermore, it would be interesting to understand who has access to the various repositories, what policies were defined, and so on.

// get principal roles
.show database ContosoSales principal roles

// get principal roles for table
.show table Customers principal roles

// get principal roles for function:
.show function SalesWithParams principal roles

// get retention policies
.show table Customers policy retention

// get sharding policies
.show table Customers policy sharding

There are many more objects one can explore. It makes sense to document the features, respectively the objects used for the various purposes.

In addition, one should check also the best practices available for the data repository (see [2]).

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Management commands overview [link]
[2] Microsoft Learn (2024) Kusto: Best practices for schema management [link]

SQL Troubles

Pages

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

16 March 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XI: Database and Server Properties)

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

25 February 2025

🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

06 February 2025

🌌🏭KQL Reloaded: First Steps (Part V: Database Metadata)

About Me