Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

06 December 2025

💎💫SQL Reloaded: Schema Differences between Database Versions - Part I: INFORMATION_SCHEMA version

During data migrations and other similar activities it's important to check what changed in the database at the various levels. Usually, it's useful to check when schemas, object names or table definitions changed, even if the changes are thoroughly documented. One can write a script to point out all the differences in one output, though it's recommended to check the differences at each level of detail

For this purpose one can use the INFORMATION_SCHEMA available for many of the RDBMS implementing it. This allows to easily port the scripts between platforms. The below queries were run on SQL Server 2025 in combination with Dynamics 365 schemas, though they should run on the earlier versions, incl. (Azure) SQL Databases. 

Such comparisons must be done from the both sides, this implying a FULL OUTER JOIN when writing a single SELECT statement, however the results can become easily hard to read and even interpret when the number of columns in output increases. Therefore, it's recommended to keep the number of columns at a minimum while addressing the scope, respectively break the FULL OUTER JOIN in two LEFT JOINs.

The simplest check is at schema level, and this can be easily done from both sides (note that database names needed to be replaced accordingly):

-- difference schemas (objects not available in the new schema)
SELECT *
FROM ( -- comparison
	SELECT DB1.CATALOG_NAME
	, DB1.SCHEMA_NAME
	, DB1.SCHEMA_OWNER
	, DB1.DEFAULT_CHARACTER_SET_NAME
	, DB2.SCHEMA_OWNER NEW_SCHEMA_OWNER
	, DB2.DEFAULT_CHARACTER_SET_NAME NEW_DEFAULT_CHARACTER_SET_NAME
	, CASE 
		WHEN DB2.SCHEMA_NAME IS NULL THEN 'schema only in old db'
		WHEN DB1.SCHEMA_OWNER <> IsNull(DB2.SCHEMA_OWNER, '') THEN 'different table type'
	  END Comment
        , CASE WHEN DB1.DEFAULT_CHARACTER_SET_NAME <> DB2.DEFAULT_CHARACTER_SET_NAME THEN 'different character sets' END Character_sets
	FROM [old database_name].INFORMATION_SCHEMA.SCHEMATA DB1
	     LEFT JOIN [new database name].INFORMATION_SCHEMA.SCHEMATA DB2
	       ON DB1.SCHEMA_NAME = DB2.SCHEMA_NAME
 ) DAT
WHERE DAT.Comment IS NOT NULL
ORDER BY DAT.CATALOG_NAME
, DAT.SCHEMA_NAME


-- difference schemas (new objects)
SELECT *
FROM ( -- comparison
	SELECT DB1.CATALOG_NAME
	, DB1.SCHEMA_NAME
	, DB1.SCHEMA_OWNER
	, DB1.DEFAULT_CHARACTER_SET_NAME
	, DB2.SCHEMA_OWNER OLD_SCHEMA_OWNER
	, DB2.DEFAULT_CHARACTER_SET_NAME OLD_DEFAULT_CHARACTER_SET_NAME
	, CASE 
		WHEN DB2.SCHEMA_NAME IS NULL THEN 'schema only in old db'
		WHEN DB1.SCHEMA_OWNER <> IsNull(DB2.SCHEMA_OWNER, '') THEN 'different table type'
	  END Comment
        , CASE WHEN DB1.DEFAULT_CHARACTER_SET_NAME <> DB2.DEFAULT_CHARACTER_SET_NAME THEN 'different character sets' END Character_sets
	FROM [new database name].INFORMATION_SCHEMA.SCHEMATA DB1
	     LEFT JOIN [old database name].INFORMATION_SCHEMA.SCHEMATA DB2
	       ON DB1.SCHEMA_NAME = DB2.SCHEMA_NAME
 ) DAT
WHERE DAT.Comment IS NOT NULL
ORDER BY DAT.CATALOG_NAME
, DAT.SCHEMA_NAME

Comments:
1) The two queries can be easily combined via a UNION ALL, though it might be a good idea then to add a column to indicate the direction of the comparison. 

The next step would be to check which objects has been changed:

-- table-based objects only in the old schema (tables & views)
SELECT *
FROM ( -- comparison
	SELECT DB1.TABLE_CATALOG
	, DB1.TABLE_SCHEMA
	, DB1.TABLE_NAME
	, DB1.TABLE_TYPE
	, DB2.TABLE_CATALOG NEW_TABLE_CATALOG
	, DB2.TABLE_TYPE NEW_TABLE_TYPE
	, CASE 
		WHEN DB2.TABLE_NAME IS NULL THEN 'objects only in old db'
		WHEN DB1.TABLE_TYPE <> IsNull(DB2.TABLE_TYPE, '') THEN 'different table type'
		--WHEN DB1.TABLE_CATALOG <> IsNull(DB2.TABLE_CATALOG, '') THEN 'different table catalog'
	  END Comment
	FROM [old database name].INFORMATION_SCHEMA.TABLES DB1
	    LEFT JOIN [new database name].INFORMATION_SCHEMA.TABLES DB2
	      ON DB1.TABLE_SCHEMA = DB2.TABLE_SCHEMA
	     AND DB1.TABLE_NAME = DB2.TABLE_NAME
 ) DAT
WHERE DAT.Comment IS NOT NULL
ORDER BY DAT.TABLE_SCHEMA
, DAT.TABLE_NAME

Comments:
1) If the database was imported under another name, then the TABLE_CATALOG will have different values as well.

At column level, the query increases in complexity, given the many aspects that must be considered:

-- difference columns (columns not available in the new scheam, respectively changes in definitions)
SELECT *
FROM ( -- comparison
	SELECT DB1.TABLE_CATALOG
	, DB1.TABLE_SCHEMA
	, DB1.TABLE_NAME
	, DB1.COLUMN_NAME 
	, DB2.TABLE_CATALOG NEW_TABLE_CATALOG
	, CASE WHEN DB2.TABLE_NAME IS NULL THEN 'column only in old db' END Comment
	, DB1.DATA_TYPE
	, DB2.DATA_TYPE NEW_DATA_TYPE
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.DATA_TYPE, '') <> IsNull(DB2.DATA_TYPE, '') THEN 'Yes' END Different_data_type
	, DB1.CHARACTER_MAXIMUM_LENGTH
	, DB2.CHARACTER_MAXIMUM_LENGTH NEW_CHARACTER_MAXIMUM_LENGTH
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.CHARACTER_MAXIMUM_LENGTH, '') <> IsNull(DB2.CHARACTER_MAXIMUM_LENGTH, '') THEN 'Yes' END Different_maximum_length
	, DB1.NUMERIC_PRECISION
	, DB2.NUMERIC_PRECISION NEW_NUMERIC_PRECISION
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.NUMERIC_PRECISION, '') <> IsNull(DB2.NUMERIC_PRECISION, '') THEN 'Yes' END Different_numeric_precision
	, DB1.NUMERIC_SCALE
	, DB2.NUMERIC_SCALE NEW_NUMERIC_SCALE
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.NUMERIC_SCALE, '') <> IsNull(DB2.NUMERIC_SCALE,'') THEN 'Yes' END Different_numeric_scale
	, DB1.CHARACTER_SET_NAME
	, DB2.CHARACTER_SET_NAME NEW_CHARACTER_SET_NAME
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.CHARACTER_SET_NAME, '') <> IsNull(DB2.CHARACTER_SET_NAME, '') THEN 'Yes' END Different_character_set_name 
	, DB1.COLLATION_NAME
	, DB2.COLLATION_NAME NEW_COLLATION_NAME
	, CASE WHEN DB2.TABLE_NAME IS NOT NULL AND IsNull(DB1.COLLATION_NAME, '') <> IsNull(DB2.COLLATION_NAME, '') THEN 'Yes' END Different_collation_name
	, DB1.ORDINAL_POSITION
	, DB2.ORDINAL_POSITION NEW_ORDINAL_POSITION
	, DB1.COLUMN_DEFAULT
	, DB2.COLUMN_DEFAULT NEW_COLUMN_DEFAULT
	, DB1.IS_NULLABLE
	, DB2.IS_NULLABLE NEW_IS_NULLABLE
	FROM [old database name].INFORMATION_SCHEMA.COLUMNS DB1
	    LEFT JOIN [new database name].INFORMATION_SCHEMA.COLUMNS DB2
	      ON DB1.TABLE_SCHEMA = DB2.TABLE_SCHEMA
	     AND DB1.TABLE_NAME = DB2.TABLE_NAME
	     AND DB1.COLUMN_NAME = DB2.COLUMN_NAME
 ) DAT
WHERE DAT.Comment IS NOT NULL
  OR IsNull(DAT.Different_data_type,'') = 'Yes'
  OR IsNull(DAT.Different_maximum_length,'') = 'Yes'
  OR IsNull(DAT.Different_numeric_precision,'') = 'Yes'
  OR IsNull(DAT.Different_numeric_scale,'') = 'Yes'
  OR IsNull(DAT.Different_character_set_name,'') = 'Yes'
  OR IsNull(DAT.Different_collation_name,'') = 'Yes'
ORDER BY DAT.TABLE_SCHEMA
, DAT.TABLE_NAME
, DAT.COLLATION_NAME

Comments:
1) The query targets only the most common scenarios, therefore must be changed to handle further cases (e.g. different column defaults, different attributes like nullable, etc.)!
2) The other perspective can be obtained by inverting the table names (without aliases) and changing the name of the columns from "NEW_' to "OLD_" (see the queries for schemas).
3) One can move the column-based conditions for the differences in the main query, though then is needed to duplicate the logic, which will make the code more challenging to change and debug. 

Happy coding!

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

  • {def} system database available as a global resource to all users connected to a Database Engine instance [1]
    • does not persist after SQL Server shuts down 
      • created at restart using the specifications from the model database 
      • doesn’t need crash recovery
      • requires rollback capabilities
        • {scenario} when updating a row in a global temporary table and rolling back the transaction [12]
          • there's no way to undo this change unless was logged the ‘before value’ of the update [12]
          • there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]
        • {scenario} during an Insert operation
          • only redo information is needed
          • when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]
            • only redo information is needed
          • not all objects in TempDB are subject to logging [18]
    • {characteristic} critical database
      • ⇒ needs to be configured adequately 
      •  has the highest level of create and drop actions [5]
      • under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
      • workers use tempdb like any other database
        • any worker can issue I/O to and from tempdb as needed [5]
    • if the tempdb database cannot be created, SQL Server will not start [7]
      • {recommendation} start SQL Server by using the (-f) startup parameter
  • files
    • data file:
      • {default} dempdb.mdf
    • log file
      • {default} templog.ldf
  • {operation} changing the tempdb database
    • {step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
    • {step} stop and restart SQL Server
  • {operation} create file group
    • only one file group for data and one file group for logs are allowed.
    • {default} the number of files is set to 1
    • can be a high contention point for internal tracking structures
      •  often it's better to create multiple files for tempdb
    • multiple files can be created for each file group
      • {recommendation} set the number of files to match the number of CPUs that are configured for the instance
        • ⇐ if the number of logical processors is less than or equal to eight [1]
          • otherwise, use eight data files [1]
          • contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]
        • ⇐ it’s not imperative

      • {scenario}add a file per CPU (aka multiple files)
        • allows the file group manager to 
          • pick the next file 
            • there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]
          •  then use the target location to start locating free space [4]                
        • ⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)
          • allows each of the logical schedulers to loosely align with a file [4]
            • since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention  [4]        
        • internal latching and other activities achieve better separation (per file) 
          • ⇒ the workers don’t cause a resource contention on a single tempdb file
            • rare type of contention [5]
        • {misconception}
          • leads to new I/O threads per file [5]       
      • {scenario} add more data files 
        • may help to solve potential performance problems that are due to I/O operations. 
        • helps to avoid a latch contention on allocation pages
          •  manifested as a UP-latch)
      • {downside} having too many files 
        • increases the cost of file switching
        • requires more IAM pages
        • increases the manageability overhead. 
      • {recommendation} configure the size of the files to be equal
        • ⇐ to better use the allocation mechanism (proportional fill),
      • {recommendation} create tempdb files striped across fast disks 
    • the size of the tempdb affect the performance of the database 
    • {event} server restarts
      • the tempdb file size is reset to the configured value (the default is 8 MB).
    • {feature} auto grow 
      • temporary 
        • ⇐  unlike other types of databases)
      • when the tempdb grows, all transactional activity may come to an halt 
        • because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
        • {recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]
      • {recommendation} should only be used more for exceptions rather than as a strategy [12]
    • transactions
      • lose the durability attribute
      • can be rolled back
        • there is no need to REDO them because the contents of tempdb do not persist across server restarts
          • because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions 
        • most internal operations on tempdb do not generate log records because there is no need to roll back
    • {restriction} add filegroups [1]
    • {restriction} remove the primary filegroup, primary data file, or log file [1]
    • {restriction} renaming the database or primary filegroup [1]
    • {restriction} back up or restore the database [1]
    • {restriction} change collation [1]
      • the default collation is the server collation [1]
    • {restriction} change the database owner [1]
      • owned by sa
    • {restriction} create a database snapshot [1]
    • {restriction} drop the database [1]
    • {restriction} drop the quest user from the database [1]
    • {restriction} enable CDC
    • {restriction} participate in database mirroring [1]
    • {restriction} can only be configured in the simple recovery model
      • ⇒ the value can’t be changed
    • {restriction} auto shrink is not allowed for tempdb
      • database shrink and file shrink capabilities are limited
        •  because many of the hidden objects stored in tempdb cannot be moved by shrink operations
    • {restriction} the database CHECKSUM option cannot be enabled
    • {restriction} database snapshots cannot be created
    • {restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.
      • only offline checking for DBCC CHECKTABLE is performed
        • ⇒ TAB-S lock is needed
        • here are internal consistency checks that occur when tempdb is in use
          • if the checks fail, the user connection is broken and the tempdb space used by the connection is freed. 
    • {restriction} set the database to OFFLINE
    • {restriction} set the database or primary filegroup to READ_ONLY [1]
  • used by
    • {feature} query
    • {feature} temporary tables
    • {feature} table variables
    • {feature} table-valued functions
    • {feature} user-defined functions
    • {feature} online/offline index creation
    • {feature} triggers 
    • {feature} cursors
    • {feature} RCSI
    • {feature} MARS
    • {feature} DBCC CHECK
    • {feature} LOB parameters 
    • {feature} Service Broker and event notification
    • {feature} XML and LOB variable
    • {feature} query notifications
    • {feature} database mail
  • used to store
    • user objects
      • objects explicitly created
        • user objects that can be created in a user database can also be created in tempdb [1]
          • {limitation} there's no durability guarantee [1]
          • {limitation} dropped when the Database Engine instance restarts [1]
      • {type} global and local temporary tables 
        • correspond to ## and # tables and table variables created explicitly by the application [12]
          • REDO information is not logged
      • {type} indexes on global and local temporary tables
      • {type} table variables
      • {type} tables returned in table-valued functions
      • {type} cursors
      • improved caching for temporary objects
    • temp objects
      • only cached when none of the following conditions is violated: [12]
        • named constraints are not created
      • DDL statements that affect the table are not run after the temp table has been created
        • e.g. CREATE INDEX or CREATE STATISTICS statements
      • not created by using dynamic SQ
        • e.g. sp_executesql N'create table #t(a int)'
    • internal objects
      • objects created internally by SQL Server
        • each object uses a minimum of nine pages [1]
          • an IAM page and an eight-page extent [1]
      • {type} work tables
        • store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]
      • {type} work files
        • used for for hash join or hash aggregate operations [1]
      • {type} intermediate sort results
        • used for operations 
          • creating or rebuilding indexes
            • if SORT_IN_TEMPDB is specified
          • certain GROUP BY, ORDER BY
          • UNION queries
      • {restriction} applications cannot directly insert into or delete rows from them
      • {restriction} metadata is stored in memory
      • {restriction} metadata does not appear in system catalog views such as sys.all_objects.
      • {restriction} considered to be hidden objects
      • {restriction} updates to them do not generate log records
      • {restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated. 
      • each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
      • used
        • to store intermediate runs for sort
        • to store intermediate results for hash joins and hash aggregates.
        • to store XML variables or other LOB data type variables
          • e.g. text, image, ntext, varchar(max), varbinary(max), and all others
        • by queries that need a spool to store intermediate results
        • by keyset cursors to store the keys
        • by static cursors to store a query result
        • by Service Broker to store messages in transit
        • by INSTEAD OF triggers to store data for internal processing
        • by any feature that uses the above mentioned operations 
    • version stores
      • collections of data pages that hold the data rows that support row versioning [1]
      • {type} common version store
      • {type} online index build version store
  • {} storage
    • I/O characteristics
      • {requirement} read after rights
        • {def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]
      • {recommendation} writer ordering
        • {def} the ability of the subsystem to maintain the correct order of write operations [7]
      • {recommendation} torn I/O prevention
        • {def} the ability of the system to avoid splitting individual I/O requests [7]
      • {requirement} physical sector alignment and size
        • devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]
    • can be put on specialty systems
      • ⇐  they can’t be used for other databases
      • RAM disks
        • non-durable media
        • support double RAM cache
          • one in the buffer pool and one on the RAM disk
            •  directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]
        • give up RAM
          • implementations of RAM disks and RAM-based files caches
      • solid state
        • high speed subsystems
        • {recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]
    • sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]
  • {feature} logging optimization 
    • avoids logging the "after value" in certain log records in tempdb
      • when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file 
    • can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device
  • {feature} instant data file initialization 
    • isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
    • {benefit} minimizes overhead significantly when tempdb needs to auto grow. 
      • without this, auto grow could take a long time and lead to application timeout.
    • reduces the impact of database creation [5]
      • because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]
        • reduces the gain from using multiple threads during database creation. [5]
  • {feature} proportional fill optimization
    • reduces UP latch contention in tempdb
    • when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time
      •  accomplished by removing a latch that was taken during proportional fill.
  • {feature} deferred drop in tempdb
    • when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait
      • ⇒ faster response time to applications.
  • {feature} worktable caching 
    • {improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated
      • ⇐ in addition, the first nine pages for the work table are kept.
  • {feature} caches temporary objects
    • when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming
      •  this can cause contentions on tempdb system catalog tables and allocation pages 
Previous Post <<||>> Next Post

References
[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005
[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]
[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]
[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]
[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]
[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output
LOB - large object
MARS - Multiple Active Result Sets
NTFS - New Technology File System
RAM - Random-Access Memory
RCSI - Read Committed Snapshot Isolation

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented. 

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

  • {def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities
    • designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
    • based on a robust DCP 
      • designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
      • a highly-available micro-service architecture with well-defined responsibilities [2]
        • data and query processing is packaged into units (aka tasks) 
          • can be readily moved across compute nodes and re-started at the task level
        • widely-partitioned data with a flexible distribution model [2]
        • a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
        • a framework for fine-grained monitoring and flexible scheduling of tasks [2]
  • {component} SQL Server Front End (SQL-FE)
    • responsible for 
      • compilation
      • authorization
      • authentication
      • metadata
        • used by the compiler to 
          • {operation} generate the search space (aka MEMO) for incoming queries
          • {operation} bind metadata to data cells
          • leveraged to ensure the durability of the transaction manifests at commit [2]
            • only transactions that successfully commit need to be actively tracked to ensure consistency [2]
            • any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]
  • {component} SQL Server Backend (SQL-BE)
    • used to perform write operations on the LST [2]
      • inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
      • a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]
        • SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes  [2]
        • each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]
          • the serialized information is then uploaded as a block to the manifest file
          • uploading the block does not yet make any visible changes to the file [2]
            • each block is identified by a unique ID generated on the writing SQL BE [2]
        • after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]
          • the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]
      • the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]
        • at this point, the changes to the file on storage will become effective [2]
      • changes to the manifest file are not visible until the Commit operation on the SQL FE
        • the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]
      • the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]
    • [read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE
      • the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]
        • queries over these are processed by the SQL Server query execution engine [2]
        • the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]
          • enables the cache to be used by different operations operating on different snapshots of the table [2]
          • enables the cache to be incrementally updated as new transactions commit [2]
  • {feature} supports explicit user transactions
    • can execute multiple statements within the same transaction in a consistent way
      • the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]
        • changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]
    • [multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
    • {write operations} the behavior of the SQL BE depends on the type of the operation.
      • insert operations 
        • only add new data and have no dependency on previous changes [2]
        • the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
        • the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks
          • ⇐ effectively appends the data to the manifest file [2]
    • {update|delete operations} 
      • handled differently 
        • ⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]
          • e.g. an update operation can be followed by another update operation touching the same rows
        • the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]
      • SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]
        • the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]
  • {concept} Distributed Query Processor (DQP)
    • responsible for 
      • distributed query optimization
      • distributed query execution
      • query execution topology management
  • {concept} Workload Management (WLM)
    •  consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]
      • each compute server runs two micro-services
        • {service} Execution Service (ES) 
          • responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]
        • {service} SQL Server instance
          • used as the back-bone for execution of the template query for a given task  [2]
            • ⇐ holds a cache on top of local SSDs 
              • in addition to in-memory caching of hot data
            • data can be transferred from one compute server to another
              • via dedicated data channels
              • the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
              • the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]
  • {concept} cell data abstraction
    • the key building block that enables to abstract data stores
      • abstracts DQP from the underlying store [1]
      • any dataset can be mapped to a collection of cells [1]
      • allows distributing query processing over data in diverse formats [1]
      • tailored for vectorized processing when the data is stored in columnar formats [1] 
      • further improves relational query performance
    • 2-dimenstional
      • distributions (data alignment)
      • partitions (data pruning)
    • each cell is self-contained with its own statistics [1]
      • used for both global and local QO [1]
      • cells can be grouped physically in storage [1]
      • queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]
    • {concept} distributed query processing (DQP) framework
      • operates at the cell level 
      • agnostic to the details of the data within a cell
        • data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]
  • {concept} dataset
    • logically abstracted as a collection of cells [1] 
    • can be arbitrarily assigned to compute nodes to achieve parallelism [1]
    • uniformly distributed across a large number of cells 
      • [scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
      •  such that they can be processed in parallel across nodes
  • {concept} session
    • supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
    • all data are accessible from any session [1]
      • multiple sessions can access all underlying data concurrently  [1]
  • {concept} Physical Metadata layer
    • new layer introduced in the SQL Server storage engine [2]
See also: Polaris

References:
[1] Josep Aguilar-Saborit et al (2020) POLARIS: The Distributed SQL Engine in Azure Synapse, Proceedings of the VLDB Endowment PVLDB 13(12) [link]
[2] Josep Aguilar-Saborit et al (2024), Extending Polaris to Support Transactions [link]
[3] Gjnana P Duvvuri (2024) Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R2] Patrick Pichler (2023) Data Warehouse (Polaris) vs. Data Lakehouse (Spark) in Microsoft Fabric [link]
[R3] Tiago Balabuch (2023) Microsoft Fabric Data Warehouse - The Polaris engine [link]

Acronyms:
CPU - Central Processing Unit
DAG - Directed Acyclic Graph
DB - Database
DCP - Distributed Computation Platform 
DQP - Distributed Query Processing 
DWH - Data Warehouses 
ES - Execution Service
LST - Log-Structured Table
SQL BE - SQL Backend
SQL FE - SQL Frontend
SSD - Solid State Disk
WAL - Write-Ahead Log
WLM - Workload Management

22 March 2025

💠🛠️🗒️SQL Server: Statistics Issues [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 22-Mar-2024

[SQL Server 2005] Statistics

  • {def} objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view [2] [see notes]
  • {issue} inaccurate statistics (aka bad statistics)
    • the primary source for cardinality estimation errors [6]
    • guessed selectivity can lead to a less-than-optimal query plan
    • {cause} missing statistics
      • if an error or other event prevents the query optimizer from creating statistics, the query optimizer creates the query plan without using statistics [22] 
        • the query optimizer marks the statistics as missing and attempts to regenerate the statistics the next time the query is executed [22]
      • [SSMS] indicated as warnings when the execution plan of a query is graphically displayed [22]
    • [SQL Server Profiler] monitoring the Missing Column Statistics event class indicates when statistics are missing [22]
    • troubleshooting
      • {step} verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on [22]
      • verify that the database is not read-only [22]
        • the query optimizer cannot save statistics in read-only databases [22]
      • create the missing statistics by using the CREATE STATISTICS statement [22]
    • {cause} use of local variables
    • {cause} non-constant-foldable expressions
    • {cause} complex expressions
    • {cause} ascending key columns
      • can cause inaccurate statistics in tables with frequent INSERTs because new values all lie outside the histogram [3]
    • it is not necessary for statistics to be highly accurate to be effective [18]
  • {issue} stale statistics (aka outdated statistics)
    • statistics are outdated based on the number of modifications of the statistics (and index) columns [6]
    • SQL Server checks to see if the statistics are outdated when it looks up a plan from the cache [6]
      •  it recompiles the query if they are [6]
        • ⇐ triggers a statistics update [6]
    •  [temporary tables] can increase the number of recompilations triggered by outdated statistics [6]
    • [query hint] KEEPFIXED PLAN
      • prevents query recompilation in cases of outdated statistics [6]
        • queries are recompiled only when the schemas of the underlying tables are changed or the recompilation is forced
        • e.g. when a stored procedure is called using the WITH RECOMPILE clause [6]
    • can be identified based on
      • Stats_Date(object_id, stats_id) returns the last update date
      • Rowmodctr from sys.sysindexes
    • has negative impact on performance 
      • almost always outweighs the performance benefits of avoiding automatic statistics updates/creation [2]
        • ⇐ applies also to missing statistics [2]
        • auto update/create stats should be disabled only when thorough testing has shown that there's no other way to achieve the performance or scalability you require [2]
        • without correct and up-to-date statistical information, can be challenging to determine the best execution plan for a particular query [8]
  • {issue} missing statistics
    • {symptom} excessive number of logical reads 
      • often indicates suboptimal execution plans 
        • due to missing indexes and/or suboptimal join strategies selected 
          • because of incorrect cardinality estimation [6]
      • shouldn't be used as the only criteria during optimization 
        • further factors should be considered
          • resource usage, parallelism, related operators in the execution plan [6]
  • {issue} unused statistics 
    • table can end up having man statistics that serve no purpose
      • it makes sense to review and clean up the statistics as part of general maintenance
        • based on the thresholds for the automatic update
  • {issue} skewed statistics 
    • may happen when the sample rate doesn’t reflect the characteristics of values’ distribution
      • {solution} using a higher sample rate 
        • update statistics manually with fullscan
  • {issue} statistics accumulate over time 
    • statistics objects are persistent and will remain in the database forever until explicitly dropped
      • this becomes expensive over time 
    • {solution} drop all auto stats 
      • will place some temporary stress on the system [11]
      • all queries that adheres to a certain pattern that requires a statistics to be created, will wait [11]
      • typically in a matter of minutes most of the missing statistics will be already back in place and the temporary stress will be over [11]
      • {tip} the cleanup can be performed at off-peak hours and ‘warm-up’ the database by capturing and replaying a trace of the most common read only queries [11]
        • create many of the required statistics objects without impacting your ‘real’ workload [11]
      • {downside} statements may deadlock on schema locks with concurrent sessions [11]
  • {issue} overlapping statistics 
    • column statistics created by SQL Server based on queries that get executed can overlap with statistics for indexes created by users on the same column [2] 

Previous Post <<||>> Next Post

References:
[2] Ken Henderson (2003) Guru's Guide to SQL Server Architecture and Internals
[3] Kendal Van Dyke (2010) More On Overlapping Statistics [link
[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[18] Joe Chang’s Blog (2012) Decoding STATS_STREAM, by Joe Chang
[22] MSDN (2016) Statistics [link]

Resources:

Acronyms:
SSMS - SQL Server Management Studio
  

💠🛠️🗒️SQL Server: Indexed Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation. 

Last updated: 22-Mar-2024

[SQL Server 2005] Indexed View

  • {def} a materialized view
    • materializes the data from the view queries, storing it in the database in a way similar to tables [6]
      • ⇒ its definition is computed and the resulting data stored just like a table [3]
      • the view is indexed by creating a unique clustered index on it
        • the resulting structure is physically identical to a table with a clustered index
          • ⇐ nonclustered indexes also are supported on this structure
      • can be created on a partitioned table, respectively can be partitioned [1]
    • {benefit} can improve the performance of some types of queries [3]
      • e.g. queries that aggregate many rows
      • ⇐ because the view is stored the same way a table with a clustered index is stored [1]
      • ⇐ not well-suited for underlying data that are frequently updated [3]
      •  more expensive to use and maintain than filtered indexes [5]
    • [query optimizer] 
      • can use it to speed up the query execution [1]
        • the view doesn't have to be referenced in the query for the optimizer to consider it for a substitution [1]
        • {downside} DML query performance can degrade significantly [1]
          • ⇐ in some cases, a query plan can't even be produced [1]
          • when executing UPDATE, DELETE or INSERT on the base table referenced, the indexed views must be updated as well [1]
          • {recommendation} test DML queries before production use [1]
            • analyze the query plan and tune/simplify the DML statemen [1]
      • can use the structure to return results more efficiently to the user
        • contains logic to use this index in either of the cases 
          • the original query text referenced the view explicitly [2]
          • the user submits a query that uses the same components as the view (in any equivalent order) [2]
          • ⇐ the query processor expands indexed views early in the query pipeline and always uses the same matching code for both cases [2]
            • the WITH(NOEXPAND) hint tells the query processor not to expand the view definition [2]
            • also instructs the query processor to perform an index scan of the indexed view rather than expand it into its component parts [5]
            • any extra rows in the indexed view are reported as 8907 errors [5]
            • any missing rows are reported as 8908 errors [5]
      • expose some of the benefits of view materialization while retaining the benefits of global reasoning about query operations [2]
      • expanded (aka in-lined) before optimization begins
        • gives the Query Optimizer opportunities to optimize queries globally [2]
        • makes it difficult for the (query) optimizer to consider plans that perform the view evaluation first, then process the rest of the query [2]
          • arbitrary tree matching is a computationally complex problem, and the feature set of views is too large to perform this operation efficiently [2]
      • cases in which it does not match the view
        • indexed views are inserted into the Memo and evaluated against other plan choices
          • while they are often the best plan choice, this is not always the case [2]
          • the Query Optimizer can detect logical contradictions between the view definition and the query that references the view [2]
        • there are also some cases where the Query Optimizer does not recognize an indexed view even when it would be a good plan choice [2]
          • often, these cases deal with complex interactions between high-level features within the query processor (e.g. computed column matching, the algorithm to explore join orders) [2]
          • consider the WITH (NOEXPAND) hint to force the query processor to pick that indexed view [2]
            •  this usually is enough to get the plan to include the indexed view [2]
        • indexed view alternatives 
          • are generated and stored in the Memo 
          • are compared using costing equations against other possible plans
          • partial matches cost the residual operations as well
            • an indexed-view plan can be generated but not picked when the Query Optimizer considers other plans to have lower costs [2]
        • maintained as part of the update processing for tables on which the view is based
          • this makes sure that the view provides a consistent result if it is selected by the Query Optimizer for any query plan [2]
          • some query operations are incompatible with this design guarantee
            • restrictions are placed on the set of supported constructs in indexed views to make sure that the view can be created, matched, and updated efficiently [2]
        • {operation} updating indexed views
          • the core question behind the restrictions is “Can the query processor compute the necessary changes to the Indexed View clustered and nonclustered indexes without having to recompute the whole indexed view?” [2]
            • if so, the query processor can perform these changes efficiently as part of the maintenance of the base tables that are referenced in the view[2]
              • this property is relatively easy for filters, projections (compute scalar), and inner joins on keys[2]
              • operators that destroy or create data are more difficult to maintain, so often these are restricted from use in indexed views. [2]
        • matching indexed views is supported in cases beyond exact matches of the query text to the view definition [2]
          • it also supports using an indexed view for inexact matches where the definition of the view is broader than the query submitted by the user [2]
            • then applies residual filters, projections (columns in the select list), and even aggregates to use the view as a partial precomputation of the query result [2]
    • {concept} statistics on indexed views
      • normally statistics aren't needed
        • because the substitution of the indexed views into the query plan is considered only after all the statistics for the underlying tables and indexes are attached to the query plan [3]
        • used if the view is directly referenced by the NOEXPAND hint in a FROM clause 
          • an error is generated and the plan is not created if the NOEXPAND hint is used on a view that does not also contain an index [3]
      • can’t be created by using sp_createstats or updated by using sp_updatestats. 
      • auto update and auto create statistics features work for indexed views
        • created manually
          • via CREATE STATISTICS on the indexed view columns
          • via UPDATE STATISTICS to update column or index statistics on indexed views
    • {operation} creating a view
      • requires that the underlying object’s schema can’t change
      • requires WITH SCHEMABINDING option [5]
      • ⇒ must include the two-part names of all referenced tables [5]
      • ⇐ the tables can't be dropped and the columns can't be altetd when participate in a view unless the view is tropped [5]
      • ⇐ an error is raised [5]
      • the user must hold 
        • the CREATE VIEW permission in the database [1]
        • ALTER permission on the schema in which the view is being created [1]
        • if the base table resides within a different schema, the REFERENCES permission on the table is required as a minimum [1]
        • if the user creating the index differs from the users who created the view, for the index creation alone the ALTER permission on the view is required [1]
    • {operation} creating an index on the view[
      • indexes can only be created on views that have the same owner as the referenced table or tables (aka intact ownership chain between the view and the tables) [1]
    • {operation} dropping a view
      • makes all indexes on the view to be dropped  [1]
        • ⇐ all nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped [1]
          • {exception} ser-created statistics on the view are maintained [1]
      • nonclustered indexes can be individually dropped [1]
      • dropping the clustered index on the view 
        • removes the stored result set [1]
        • the optimizer returns to processing the view like a standard view [1]
    • {operation} disable indexes on tables and views
      • when a clustered index on a table is disabled, indexes on views associated with the table are also disabled [1]
    • {option} EXPAND VIEWS
      • allows to prevent the Database Engine from using indexed views [1]
        • if any of the listed options are incorrectly set, this option prevents the optimizer from using the indexes on the views [1]
        • via OPTION (EXPAND VIEWS) hint
    • {recommendation} when using datetime and smalldatetime string literals in indexed views, explicitly convert the literal to the date type by using a deterministic date format style [1]
    • {limitation} AVG is not allowed {workaround} use SUM and COUNT_BIG (5]
    • {limitation} impacted by SET options [1]
      • {restriction} require fixed values for several SET options [1]
      • {recommendation} set the ARITHABORT user option to ON server-wide as soon as the first indexed view or index on a computed column is created in any database on the server [1]
    • {limitation} further requirements apply (see [1])
    • {limitation} aren't supported on top of temporal queries
      • ⇐ queries that use FOR SYSTEM_TIME clause).
    • {scenario}simplifying SQL queries
    • {scenario} abstracting data models from user models
    • {scenario} enforcing user security


References:
[1] Microsoft Learn (2024) SQL Server: Create indexed views [link]
[2] Kalen Delaney et all (2009) Microsoft® SQL Server® 2008 Internals
[3] Microsoft Learn (2024) SQL Server: Views [link]
[4] Microsoft Learn (2024) SQL Server: CREATE INDEX (Transact-SQL) [link]
[5] Kalen Delaney et all (2012) Microsoft® SQL Server® 2012 Internals
[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

Resources:
[R1] Microsoft Learn (2024) SQL Server: Optimize index maintenance to improve query performance and reduce resource consumption [link]

Acronyms:
DML - Data Manipulation Language
QO - Query Optimizer
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.