Showing posts with label databases. Show all posts
Showing posts with label databases. Show all posts

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

  • {def} system database available as a global resource to all users connected to a Database Engine instance [1]
    • does not persist after SQL Server shuts down 
      • created at restart using the specifications from the model database 
      • doesn’t need crash recovery
      • requires rollback capabilities
        • {scenario} when updating a row in a global temporary table and rolling back the transaction [12]
          • there's no way to undo this change unless was logged the ‘before value’ of the update [12]
          • there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]
        • {scenario} during an Insert operation
          • only redo information is needed
          • when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]
            • only redo information is needed
          • not all objects in TempDB are subject to logging [18]
    • {characteristic} critical database
      • ⇒ needs to be configured adequately 
      •  has the highest level of create and drop actions [5]
      • under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
      • workers use tempdb like any other database
        • any worker can issue I/O to and from tempdb as needed [5]
    • if the tempdb database cannot be created, SQL Server will not start [7]
      • {recommendation} start SQL Server by using the (-f) startup parameter
  • files
    • data file:
      • {default} dempdb.mdf
    • log file
      • {default} templog.ldf
  • {operation} changing the tempdb database
    • {step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
    • {step} stop and restart SQL Server
  • {operation} create file group
    • only one file group for data and one file group for logs are allowed.
    • {default} the number of files is set to 1
    • can be a high contention point for internal tracking structures
      •  often it's better to create multiple files for tempdb
    • multiple files can be created for each file group
      • {recommendation} set the number of files to match the number of CPUs that are configured for the instance
        • ⇐ if the number of logical processors is less than or equal to eight [1]
          • otherwise, use eight data files [1]
          • contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]
        • ⇐ it’s not imperative

      • {scenario}add a file per CPU (aka multiple files)
        • allows the file group manager to 
          • pick the next file 
            • there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]
          •  then use the target location to start locating free space [4]                
        • ⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)
          • allows each of the logical schedulers to loosely align with a file [4]
            • since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention  [4]        
        • internal latching and other activities achieve better separation (per file) 
          • ⇒ the workers don’t cause a resource contention on a single tempdb file
            • rare type of contention [5]
        • {misconception}
          • leads to new I/O threads per file [5]       
      • {scenario} add more data files 
        • may help to solve potential performance problems that are due to I/O operations. 
        • helps to avoid a latch contention on allocation pages
          •  manifested as a UP-latch)
      • {downside} having too many files 
        • increases the cost of file switching
        • requires more IAM pages
        • increases the manageability overhead. 
      • {recommendation} configure the size of the files to be equal
        • ⇐ to better use the allocation mechanism (proportional fill),
      • {recommendation} create tempdb files striped across fast disks 
    • the size of the tempdb affect the performance of the database 
    • {event} server restarts
      • the tempdb file size is reset to the configured value (the default is 8 MB).
    • {feature} auto grow 
      • temporary 
        • ⇐  unlike other types of databases)
      • when the tempdb grows, all transactional activity may come to an halt 
        • because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
        • {recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]
      • {recommendation} should only be used more for exceptions rather than as a strategy [12]
    • transactions
      • lose the durability attribute
      • can be rolled back
        • there is no need to REDO them because the contents of tempdb do not persist across server restarts
          • because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions 
        • most internal operations on tempdb do not generate log records because there is no need to roll back
    • {restriction} add filegroups [1]
    • {restriction} remove the primary filegroup, primary data file, or log file [1]
    • {restriction} renaming the database or primary filegroup [1]
    • {restriction} back up or restore the database [1]
    • {restriction} change collation [1]
      • the default collation is the server collation [1]
    • {restriction} change the database owner [1]
      • owned by sa
    • {restriction} create a database snapshot [1]
    • {restriction} drop the database [1]
    • {restriction} drop the quest user from the database [1]
    • {restriction} enable CDC
    • {restriction} participate in database mirroring [1]
    • {restriction} can only be configured in the simple recovery model
      • ⇒ the value can’t be changed
    • {restriction} auto shrink is not allowed for tempdb
      • database shrink and file shrink capabilities are limited
        •  because many of the hidden objects stored in tempdb cannot be moved by shrink operations
    • {restriction} the database CHECKSUM option cannot be enabled
    • {restriction} database snapshots cannot be created
    • {restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.
      • only offline checking for DBCC CHECKTABLE is performed
        • ⇒ TAB-S lock is needed
        • here are internal consistency checks that occur when tempdb is in use
          • if the checks fail, the user connection is broken and the tempdb space used by the connection is freed. 
    • {restriction} set the database to OFFLINE
    • {restriction} set the database or primary filegroup to READ_ONLY [1]
  • used by
    • {feature} query
    • {feature} temporary tables
    • {feature} table variables
    • {feature} table-valued functions
    • {feature} user-defined functions
    • {feature} online/offline index creation
    • {feature} triggers 
    • {feature} cursors
    • {feature} RCSI
    • {feature} MARS
    • {feature} DBCC CHECK
    • {feature} LOB parameters 
    • {feature} Service Broker and event notification
    • {feature} XML and LOB variable
    • {feature} query notifications
    • {feature} database mail
  • used to store
    • user objects
      • objects explicitly created
        • user objects that can be created in a user database can also be created in tempdb [1]
          • {limitation} there's no durability guarantee [1]
          • {limitation} dropped when the Database Engine instance restarts [1]
      • {type} global and local temporary tables 
        • correspond to ## and # tables and table variables created explicitly by the application [12]
          • REDO information is not logged
      • {type} indexes on global and local temporary tables
      • {type} table variables
      • {type} tables returned in table-valued functions
      • {type} cursors
      • improved caching for temporary objects
    • temp objects
      • only cached when none of the following conditions is violated: [12]
        • named constraints are not created
      • DDL statements that affect the table are not run after the temp table has been created
        • e.g. CREATE INDEX or CREATE STATISTICS statements
      • not created by using dynamic SQ
        • e.g. sp_executesql N'create table #t(a int)'
    • internal objects
      • objects created internally by SQL Server
        • each object uses a minimum of nine pages [1]
          • an IAM page and an eight-page extent [1]
      • {type} work tables
        • store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]
      • {type} work files
        • used for for hash join or hash aggregate operations [1]
      • {type} intermediate sort results
        • used for operations 
          • creating or rebuilding indexes
            • if SORT_IN_TEMPDB is specified
          • certain GROUP BY, ORDER BY
          • UNION queries
      • {restriction} applications cannot directly insert into or delete rows from them
      • {restriction} metadata is stored in memory
      • {restriction} metadata does not appear in system catalog views such as sys.all_objects.
      • {restriction} considered to be hidden objects
      • {restriction} updates to them do not generate log records
      • {restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated. 
      • each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
      • used
        • to store intermediate runs for sort
        • to store intermediate results for hash joins and hash aggregates.
        • to store XML variables or other LOB data type variables
          • e.g. text, image, ntext, varchar(max), varbinary(max), and all others
        • by queries that need a spool to store intermediate results
        • by keyset cursors to store the keys
        • by static cursors to store a query result
        • by Service Broker to store messages in transit
        • by INSTEAD OF triggers to store data for internal processing
        • by any feature that uses the above mentioned operations 
    • version stores
      • collections of data pages that hold the data rows that support row versioning [1]
      • {type} common version store
      • {type} online index build version store
  • {} storage
    • I/O characteristics
      • {requirement} read after rights
        • {def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]
      • {recommendation} writer ordering
        • {def} the ability of the subsystem to maintain the correct order of write operations [7]
      • {recommendation} torn I/O prevention
        • {def} the ability of the system to avoid splitting individual I/O requests [7]
      • {requirement} physical sector alignment and size
        • devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]
    • can be put on specialty systems
      • ⇐  they can’t be used for other databases
      • RAM disks
        • non-durable media
        • support double RAM cache
          • one in the buffer pool and one on the RAM disk
            •  directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]
        • give up RAM
          • implementations of RAM disks and RAM-based files caches
      • solid state
        • high speed subsystems
        • {recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]
    • sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]
  • {feature} logging optimization 
    • avoids logging the "after value" in certain log records in tempdb
      • when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file 
    • can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device
  • {feature} instant data file initialization 
    • isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
    • {benefit} minimizes overhead significantly when tempdb needs to auto grow. 
      • without this, auto grow could take a long time and lead to application timeout.
    • reduces the impact of database creation [5]
      • because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]
        • reduces the gain from using multiple threads during database creation. [5]
  • {feature} proportional fill optimization
    • reduces UP latch contention in tempdb
    • when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time
      •  accomplished by removing a latch that was taken during proportional fill.
  • {feature} deferred drop in tempdb
    • when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait
      • ⇒ faster response time to applications.
  • {feature} worktable caching 
    • {improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated
      • ⇐ in addition, the first nine pages for the work table are kept.
  • {feature} caches temporary objects
    • when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming
      •  this can cause contentions on tempdb system catalog tables and allocation pages 
Previous Post <<||>> Next Post

References
[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005
[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]
[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]
[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]
[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]
[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output
LOB - large object
MARS - Multiple Active Result Sets
NTFS - New Technology File System
RAM - Random-Access Memory
RCSI - Read Committed Snapshot Isolation

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented. 

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

  • {def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities
    • designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
    • based on a robust DCP 
      • designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
      • a highly-available micro-service architecture with well-defined responsibilities [2]
        • data and query processing is packaged into units (aka tasks) 
          • can be readily moved across compute nodes and re-started at the task level
        • widely-partitioned data with a flexible distribution model [2]
        • a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
        • a framework for fine-grained monitoring and flexible scheduling of tasks [2]
  • {component} SQL Server Front End (SQL-FE)
    • responsible for 
      • compilation
      • authorization
      • authentication
      • metadata
        • used by the compiler to 
          • {operation} generate the search space (aka MEMO) for incoming queries
          • {operation} bind metadata to data cells
          • leveraged to ensure the durability of the transaction manifests at commit [2]
            • only transactions that successfully commit need to be actively tracked to ensure consistency [2]
            • any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]
  • {component} SQL Server Backend (SQL-BE)
    • used to perform write operations on the LST [2]
      • inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
      • a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]
        • SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes  [2]
        • each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]
          • the serialized information is then uploaded as a block to the manifest file
          • uploading the block does not yet make any visible changes to the file [2]
            • each block is identified by a unique ID generated on the writing SQL BE [2]
        • after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]
          • the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]
      • the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]
        • at this point, the changes to the file on storage will become effective [2]
      • changes to the manifest file are not visible until the Commit operation on the SQL FE
        • the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]
      • the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]
    • [read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE
      • the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]
        • queries over these are processed by the SQL Server query execution engine [2]
        • the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]
          • enables the cache to be used by different operations operating on different snapshots of the table [2]
          • enables the cache to be incrementally updated as new transactions commit [2]
  • {feature} supports explicit user transactions
    • can execute multiple statements within the same transaction in a consistent way
      • the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]
        • changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]
    • [multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
    • {write operations} the behavior of the SQL BE depends on the type of the operation.
      • insert operations 
        • only add new data and have no dependency on previous changes [2]
        • the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
        • the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks
          • ⇐ effectively appends the data to the manifest file [2]
    • {update|delete operations} 
      • handled differently 
        • ⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]
          • e.g. an update operation can be followed by another update operation touching the same rows
        • the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]
      • SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]
        • the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]
  • {concept} Distributed Query Processor (DQP)
    • responsible for 
      • distributed query optimization
      • distributed query execution
      • query execution topology management
  • {concept} Workload Management (WLM)
    •  consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]
      • each compute server runs two micro-services
        • {service} Execution Service (ES) 
          • responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]
        • {service} SQL Server instance
          • used as the back-bone for execution of the template query for a given task  [2]
            • ⇐ holds a cache on top of local SSDs 
              • in addition to in-memory caching of hot data
            • data can be transferred from one compute server to another
              • via dedicated data channels
              • the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
              • the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]
  • {concept} cell data abstraction
    • the key building block that enables to abstract data stores
      • abstracts DQP from the underlying store [1]
      • any dataset can be mapped to a collection of cells [1]
      • allows distributing query processing over data in diverse formats [1]
      • tailored for vectorized processing when the data is stored in columnar formats [1] 
      • further improves relational query performance
    • 2-dimenstional
      • distributions (data alignment)
      • partitions (data pruning)
    • each cell is self-contained with its own statistics [1]
      • used for both global and local QO [1]
      • cells can be grouped physically in storage [1]
      • queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]
    • {concept} distributed query processing (DQP) framework
      • operates at the cell level 
      • agnostic to the details of the data within a cell
        • data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]
  • {concept} dataset
    • logically abstracted as a collection of cells [1] 
    • can be arbitrarily assigned to compute nodes to achieve parallelism [1]
    • uniformly distributed across a large number of cells 
      • [scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
      •  such that they can be processed in parallel across nodes
  • {concept} session
    • supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
    • all data are accessible from any session [1]
      • multiple sessions can access all underlying data concurrently  [1]
  • {concept} Physical Metadata layer
    • new layer introduced in the SQL Server storage engine [2]
See also: Polaris

References:
[1] Josep Aguilar-Saborit et al (2020) POLARIS: The Distributed SQL Engine in Azure Synapse, Proceedings of the VLDB Endowment PVLDB 13(12) [link]
[2] Josep Aguilar-Saborit et al (2024), Extending Polaris to Support Transactions [link]
[3] Gjnana P Duvvuri (2024) Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R2] Patrick Pichler (2023) Data Warehouse (Polaris) vs. Data Lakehouse (Spark) in Microsoft Fabric [link]
[R3] Tiago Balabuch (2023) Microsoft Fabric Data Warehouse - The Polaris engine [link]

Acronyms:
CPU - Central Processing Unit
DAG - Directed Acyclic Graph
DB - Database
DCP - Distributed Computation Platform 
DQP - Distributed Query Processing 
DWH - Data Warehouses 
ES - Execution Service
LST - Log-Structured Table
SQL BE - SQL Backend
SQL FE - SQL Frontend
SSD - Solid State Disk
WAL - Write-Ahead Log
WLM - Workload Management

16 March 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XI: Database and Server Properties)

When taking over a SQL Server, respectively database, one of the first checks I do focuses on the overall configuration, going through the UI available for admins to see if I can find anything that requires further investigation. If no documentation is available on the same, I run a few scripts and export their output as baseline. 

Especially when documenting the configuration, it's useful to export the database options and properties defined at database level. Besides the collation and probably the recovery mode, typically the rest of the configuration is similar, though in exceptional cases one should expect also surprises that require further investigation! 

The following query retrieves in a consolidated way all the options and properties of a SQL database in Microsoft Fabric. 

-- database settings/properties 
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') Collation
--, DATABASEPROPERTYEX(DB_NAME(), 'ComparisonStyle')  ComparisonStyle
, DATABASEPROPERTYEX(DB_NAME(), 'Edition') Edition
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullDefault') IsAnsiNullDefault
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullsEnabled') IsAnsiNullsEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiPaddingEnabled') IsAnsiPaddingEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiWarningsEnabled') IsAnsiWarningsEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsArithmeticAbortEnabled') IsArithmeticAbortEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoClose') IsAutoClose
, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatistics') IsAutoCreateStatistics
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatisticsIncremental') IsAutoCreateStatisticsIncremental
--, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoShrink') IsAutoShrink
, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoUpdateStatistics') IsAutoUpdateStatistics
--, DATABASEPROPERTYEX(DB_NAME(), 'IsClone') IsClone
--, DATABASEPROPERTYEX(DB_NAME(), 'IsCloseCursorsOnCommitEnabled') IsCloseCursorsOnCommitEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsDatabaseSuspendedForSnapshotBackup') IsDatabaseSuspendedForSnapshotBackup
, DATABASEPROPERTYEX(DB_NAME(), 'IsFulltextEnabled') IsFulltextEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsInStandBy') IsInStandBy
--, DATABASEPROPERTYEX(DB_NAME(), 'IsLocalCursorsDefault') IsLocalCursorsDefault
--, DATABASEPROPERTYEX(DB_NAME(), 'IsMemoryOptimizedElevateToSnapshotEnabled') IsMemoryOptimizedElevateToSnapshotEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsMergePublished') IsMergePublished
--, DATABASEPROPERTYEX(DB_NAME(), 'IsNullConcat') IsNullConcat
--, DATABASEPROPERTYEX(DB_NAME(), 'IsNumericRoundAbortEnabled') IsNumericRoundAbortEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsParameterizationForced') IsParameterizationForced
--, DATABASEPROPERTYEX(DB_NAME(), 'IsQuotedIdentifiersEnabled') IsQuotedIdentifiersEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsPublished') IsPublished
--, DATABASEPROPERTYEX(DB_NAME(), 'IsRecursiveTriggersEnabled') IsRecursiveTriggersEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsSubscribed') IsSubscribed
--, DATABASEPROPERTYEX(DB_NAME(), 'IsSyncWithBackup') IsSyncWithBackup
--, DATABASEPROPERTYEX(DB_NAME(), 'IsTornPageDetectionEnabled') IsTornPageDetectionEnabled
--, DATABASEPROPERTYEX(DB_NAME(), 'IsVerifiedClone') IsVerifiedClone
--, DATABASEPROPERTYEX(DB_NAME(), 'IsXTPSupported') IsXTPSupported
, DATABASEPROPERTYEX(DB_NAME(), 'LastGoodCheckDbTime') LastGoodCheckDbTime
, DATABASEPROPERTYEX(DB_NAME(), 'LCID') LCID
--, DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') MaxSizeInBytes
, DATABASEPROPERTYEX(DB_NAME(), 'Recovery') Recovery
--, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjective') ServiceObjective
--, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjectiveId') ServiceObjectiveId
, DATABASEPROPERTYEX(DB_NAME(), 'SQLSortOrder') SQLSortOrder
, DATABASEPROPERTYEX(DB_NAME(), 'Status') Status
, DATABASEPROPERTYEX(DB_NAME(), 'Updateability') Updateability
, DATABASEPROPERTYEX(DB_NAME(), 'UserAccess') UserAccess
, DATABASEPROPERTYEX(DB_NAME(), 'Version') Version
--, DATABASEPROPERTYEX(DB_NAME(), 'ReplicaID') ReplicaID

Output:

Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
SQL_Latin1_General_CP1_CI_AS FabricSQLDB 1 1 1 12/31/1899 1033 FULL 52 ONLINE READ_WRITE MULTI_USER 981

The query can be run also against the SQL analytics endpoints available for warehouses in Microsoft Fabric.

Output:

Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
Latin1_General_100_BIN2_UTF8 DataWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

Respectively, for lakehouses:

Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
Latin1_General_100_BIN2_UTF8 LakeWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

A similar output is obtained if one runs the query against SQL database's SQL analytics endpoint:

Output:

Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
Latin1_General_100_BIN2_UTF8 LakeWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

SQL databases seem to inherit the collation from the earlier versions of SQL Server.

Another meaningful value for SQL databases is MaxSizeInBytes, which in my environment had a value of 3298534883328 bytes ÷ 1,073,741,824 = 3,072 GB.

There are however also server properties. Here's the consolidated overview:

-- server properties
SELECT --SERVERPROPERTY('BuildClrVersion') BuildClrVersion
 SERVERPROPERTY('Collation') Collation
--, SERVERPROPERTY('CollationID') CollationID
, SERVERPROPERTY('ComparisonStyle') ComparisonStyle
--, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') ComputerNamePhysicalNetBIOS
, SERVERPROPERTY('Edition') Edition
--, SERVERPROPERTY('EditionID') EditionID
, SERVERPROPERTY('EngineEdition') EngineEdition
--, SERVERPROPERTY('FilestreamConfiguredLevel') FilestreamConfiguredLevel
--, SERVERPROPERTY('FilestreamEffectiveLevel') FilestreamEffectiveLevel
--, SERVERPROPERTY('FilestreamShareName') FilestreamShareName
--, SERVERPROPERTY('HadrManagerStatus') HadrManagerStatus
--, SERVERPROPERTY('InstanceDefaultBackupPath') InstanceDefaultBackupPath
, SERVERPROPERTY('InstanceDefaultDataPath') InstanceDefaultDataPath
--, SERVERPROPERTY('InstanceDefaultLogPath') InstanceDefaultLogPath
--, SERVERPROPERTY('InstanceName') InstanceName
, SERVERPROPERTY('IsAdvancedAnalyticsInstalled') IsAdvancedAnalyticsInstalled
--, SERVERPROPERTY('IsBigDataCluster') IsBigDataCluster
--, SERVERPROPERTY('IsClustered') IsClustered
, SERVERPROPERTY('IsExternalAuthenticationOnly') IsExternalAuthenticationOnly
, SERVERPROPERTY('IsExternalGovernanceEnabled') IsExternalGovernanceEnabled
, SERVERPROPERTY('IsFullTextInstalled') IsFullTextInstalled
--, SERVERPROPERTY('IsHadrEnabled') IsHadrEnabled
--, SERVERPROPERTY('IsIntegratedSecurityOnly') IsIntegratedSecurityOnly
--, SERVERPROPERTY('IsLocalDB') IsLocalDB
--, SERVERPROPERTY('IsPolyBaseInstalled') IsPolyBaseInstalled
--, SERVERPROPERTY('IsServerSuspendedForSnapshotBackup') IsServerSuspendedForSnapshotBackup
--, SERVERPROPERTY('IsSingleUser') IsSingleUser
--, SERVERPROPERTY('IsTempDbMetadataMemoryOptimized') IsTempDbMetadataMemoryOptimized
, SERVERPROPERTY('IsXTPSupported') IsXTPSupported
, SERVERPROPERTY('LCID') LCID
, SERVERPROPERTY('LicenseType') LicenseType
, SERVERPROPERTY('MachineName') MachineName
, SERVERPROPERTY('NumLicenses') NumLicenses
, SERVERPROPERTY('PathSeparator') PathSeparator
--, SERVERPROPERTY('ProcessID') ProcessID
, SERVERPROPERTY('ProductBuild') ProductBuild
--, SERVERPROPERTY('ProductBuildType') ProductBuildType
--, SERVERPROPERTY('ProductLevel') ProductLevel
--, SERVERPROPERTY('ProductMajorVersion') ProductMajorVersion
--, SERVERPROPERTY('ProductMinorVersion') ProductMinorVersion
--, SERVERPROPERTY('ProductUpdateLevel') ProductUpdateLevel
--, SERVERPROPERTY('ProductUpdateReference') ProductUpdateReference
--, SERVERPROPERTY('ProductUpdateType') ProductUpdateType
, SERVERPROPERTY('ProductVersion') ProductVersion
, SERVERPROPERTY('ResourceLastUpdateDateTime') ResourceLastUpdateDateTime
, SERVERPROPERTY('ResourceVersion') ResourceVersion
, SERVERPROPERTY('ServerName') ServerName
, SERVERPROPERTY('SqlCharSet') SqlCharSet
, SERVERPROPERTY('SqlCharSetName') SqlCharSetName
, SERVERPROPERTY('SqlSortOrder') SqlSortOrder
, SERVERPROPERTY('SqlSortOrderName') SqlSortOrderName
, SERVERPROPERTY('SuspendedDatabaseCount') SuspendedDatabaseCount

Output (consolidated):

Property SQL database Warehouse Lakehouse
Collation SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CI_AS
ComparisonStyle 196609 196609 196609
Edition SQL Azure SQL Azure SQL Azure
EngineEdition 12 11 11
InstanceDefaultDataPath NULL NULL NULL
IsAdvancedAnalyticsInstalled 1 1 1
IsExternalAuthenticationOnly 1 0 0
IsExternalGovernanceEnabled 1 1 1
IsFullTextInstalled 1 0 0
IsXTPSupported 1 1 1
LCID 1033 1033 1033
LicenseType DISABLED DISABLED DISABLED
MachineName NULL NULL NULL
NumLicenses NULL NULL NULL
PathSeparator \ \ \
ProductBuild 2000 502 502
ProductVersion 12.0.2000.8 12.0.2000.8 12.0.2000.8
ResourceLastUpdateDateTime 11/6/2024 3:41:27 PM 3/5/2025 12:05:50 PM 3/5/2025 12:05:50 PM
ResourceVersion 16.00.5751 17.00.502 17.00.502
ServerName ... .... ....
SqlCharSet 1 1 1
SqlCharSetName iso_1 iso_1 iso_1
SqlSortOrder 52 52 52
SqlSortOrderName nocase_iso nocase_iso nocase_iso
SuspendedDatabaseCount NULL 0 0

It's interesting that all three instances have the same general collation, while the Engine Edition of SQL databases is not compatible with the others [2]. The Server Names has been removed manually from the output from obvious reasons. The warehouse and lakehouse are in the same environment (SQL Azure instance, see sys.databases), and therefore the same values are shown (though this might happen independently of the environments used).

The queries were run in a trial Microsoft Fabric environment. Other environments can have upon case different properties. Just replace the "--" from the commented code to get a complete overview.

The queries should run also in the other editions of SQL Server. If DATABASEPROPERTYEX is not supported, one should try DATABASEPROPERTY instead.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server 2022: DATABASEPROPERTYEX (Transact-SQL) [link]
[2] Microsoft Learn (2024) SQL Server 2022: SERVERPROPERTY (Transact-SQL) [link]

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

  • [def] 
  • a service that empowers users to extract insights and visualize data in motion
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions  [1]
    • a workspace of databases
      • can be shared across projects [1]
  • allows to manage multiple databases at once
    • sharing capacity and resources to optimize performance and cost
    • provides unified monitoring and management across all databases and per database [1]
  • provide a solution for handling and analyzing large volumes of data
    • particularly in scenarios requiring real-time analytics and exploration [1]
    • designed to handle real-time data streams efficiently [1]
      • lets organizations ingest, process, and analyze data in near real-time [1]
  • provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.
    • preferred engine for semistructured and free text analysis
    • tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
    • allows to get data 
      • from multiple sources, 
      • in multiple pipelines
        • e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.
      • multiple data formats [1]
    • data is automatically indexed and partitioned based on ingestion time
  • designed to optimize cost by suspending the service when not in use [1]
    • reactivating the service, can lead to a latency of a few seconds [1]
      • for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1] 
        • enables the service to be always available at a selected minimum level [1]
          • customers pay for 
            • the minimum compute level selected [1]
            • the actual consumption when the compute level is above the minimum set [1]
        • the specified compute is available to all the databases within the eventhouse [1]
    • {scenario} solutions that includes event-based data
      • e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]
  • KQL databases 
    • can be created within an eventhouse [1]
    • can either be a standard database, or a database shortcut [1]
    • an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
    • data availability in OneLake can be enabled on a database or table level [1]
  • Eventhouse page 
    • serves as the central hub for all your interactions within the Eventhouse environment [1]
    • Eventhouse ribbon
      • provides quick access to essential actions within the Eventhouse
    • explorer pane
      • provides an intuitive interface for navigating between Eventhouse views and working with databases [1]
    • main view area 
      • displays the system overview details for the eventhouse [1]
  • {feature} Eventhouse monitoring
    • offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
    • part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
    • provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]
      • can be used to optimize the eventhouse and improve the user experience [2]
  • {feature} query logs table
    • contains the list of queries run on an Eventhouse KQL database
      • for each query, a log event record is stored in the EventhouseQueryLogs table [3]
    • can be used to
      • analyze query performance and trends [3]
      • troubleshoot slow queries [3]
      • identify heavy queries consuming large amount of system resources [3]
      • identify the users/applications running the highest number of queries[3]
  • {feature} OneLake availability
    • {benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]
      • users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]
        • e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.
    • {prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
    • {prerequisite} a KQL database with editing permissions and data [4]
    • {constraint} rename tables
    • {constraint} alter table schemas
    • {constraint} apply RLS to tables
    • {constraint} data can't be deleted, truncated, or purged
    • when turned on, a mirroring policy is enabled
      • can be used to monitor data latency or alter it to partition delta tables [4]
  • {feature} robust adaptive mechanism
    • intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
    • ⇐ important when dealing with trickling data [4]
      • ⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]
    • delays write operations if there isn't enough data to create optimal Parquet files [4]
      • ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
      • ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
      • {default} the write operation can take up to 3 hours or until files of sufficient size are created [4]
        • typically the files have 200-256 MB
        • the value can be adjusted between 5 minutes and 3 hours [4]
          • {warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]
            • can lead to inefficient query performance [4]
        • {restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]
    • delta tables can be partitioned to improve query speed [4]
      • each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]
        • ⇒ OneLake copy has more columns than the source table [4]
References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link
[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]  
[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]
[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]
[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security 
RTI - Real-Time Intelligence

25 February 2025

🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 25-Feb-2024

[Microsoft Fabric] T-SQL notebook

  • {def} notebook that enables to write and run T-SQL code within a notebook [1]
  • {feature} allows to manage complex queries and write better markdown documentation [1]
  • {feature} allows the direct execution of T-SQL on
    • connected warehouse
    • SQL analytics endpoint
    • ⇐ queries can be run directly on the connected endpoint [1]
      • multiple connections are allowed [1]
  • allows running cross-database queries to gather data from multiple warehouses and SQL analytics endpoints [1]
  • the code is run by the primary warehouse
    • used as default in commands which supports three-part naming, though no warehouse was provided [1]
    • three-part naming consists of 
      • database name
        • the name of the warehouse or SQL analytics endpoint [1]
      • schema name
      • table name
  • {feature} autogenerate T-SQL code using the code template from the object explorer's context [1] menu
  • {concept} code cells
    • allow to create and run T-SQL code
      • each code cell is executed in a separate session [1]
        • {limitation} the variables defined in one cell are not available in another cell [1]
        • one can check the execution summary after the code is executed [1]
      • cells can be run individually or together [1]
      • one cell can contain multiple lines of code [1]
        • users can select and run subparts of a cell’s code [1]
    • {feature} Table tab
      • lists the records from the returned result set
        • if the execution contains multiple result set, you can switch from one to another via the dropdown menu [1]
  • a query can be saved as 
    • view
      • via 'Save as' view
      • {limitation} does not support three-part naming [1]
        • the view is always created in the primary warehouse [1]
          • by setting the warehouse as the primary warehouse [1]
    • table
      • via 'Save as' table
      • saved as CTAS 
    • ⇐ 'Save as' is only available for the selected query text
      • the query text must be selected before using the Save as options
  • {limitation} doesn’t support 
    • parameter cell
      • the parameter passed from pipeline or scheduler can't be used [1]
    • {feature} Recent Run 
      • {workaround} use the current data warehouse monitoring feature to check the execution history of the T-SQL notebook [1]
    • {feature} the monitor URL inside the pipeline execution
    • {feature} snapshot 
    • {feature} Git support 
    • {feature} deployment pipeline support 

References:
[1] Microsoft Learn (2025) T-SQL support in Microsoft Fabric notebooks [link
[2] Microsoft Learn (2025) Create and run a SQL Server notebook [link
[3] Microsoft Learn (2025) T-SQL surface area in Microsoft Fabric [link
[4] Microsoft Fabric Updates Blog (2024) Announcing Public Preview of T-SQL Notebook in Fabric [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms
CTAS - Create Table as Select
T-SQL - Transact SQL

06 February 2025

🌌🏭KQL Reloaded: First Steps (Part V: Database Metadata)

When working with a new data repository, one of the first things to do is to look at database's metadata, when available, and try to get a birds eye view of what's available, how big is the databases in terms of size, tables and user-defined objects, how the schema was defined, how the data are stored, eventually how often backup are taken, what users have access and to what, etc. 

So, after creating some queries in KQL and figuring out how things work, I tried to check what metadata are available, how it can be accessed, etc. The target is not to provide a full list of the available metadata, but to understand what information is available, in what format, how easy is to extract the important metadata, etc. 

So, the first set of metadata is related to database:

// get database metadata metadata
.show databases (ContosoSales)

// get database metadata metadata (multiple databases)
.show databases (ContosoSales, Samples)

// get database schema metadata
.show databases (ContosoSales) schema

// get database schema metadata (multiple databases) 
.show databases (ContosoSales, Samples) schema

// get database schema violations metadata
.show database ContosoSales schema violations

// get database entities metadata
.show databases entities with (showObfuscatedStrings=true)
| where DatabaseName == "ContosoSales"

// get database metadata 
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" and EntityType == "Table"
//| summarize count () //get the number of tables

// get a function's details
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" 
    and EntityType == "Function" 
    and EntityName == "SalesWithParams"

// get external tables metadata
.show external tables

// get materialized views metadata
.show materialized-views

// get query results metadata
.show stored_query_results

// get entities groups metadata
.show entity_groups

Then, it's useful to look at the database objects. 

// get all tables 
.show tables 
//| count

// get tables metadata
.show tables (Customers, NewSales)

// get tables schema
.show table Customers cslschema

// get schema as json
.show table Customers schema as json

// get table size: Customers
Customers
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize_MB=round(sum(sizeEstimateOfColumn)/1024.00/1024.00,2)

Unfortunately, the public environment has restrictions in what concerns the creation of objects, while for the features available one needs to create some objects to query the corresponding metadata.

Furthermore, it would be interesting to understand who has access to the various repositories, what policies were defined, and so on. 

// get principal roles
.show database ContosoSales principal roles

// get principal roles for table
.show table Customers principal roles

// get principal roles for function:
.show function SalesWithParams principal roles

// get retention policies
.show table Customers policy retention

// get sharding policies
.show table Customers policy sharding

There are many more objects one can explore. It makes sense to document the features, respectively the objects used for the various purposes.

In addition, one should check also the best practices available for the data repository (see [2]).

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Management commands overview [link]
[2] Microsoft Learn (2024) Kusto: Best practices for schema management [link]

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.