Showing posts with label tempdb. Show all posts
Showing posts with label tempdb. Show all posts

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

  • {def} system database available as a global resource to all users connected to a Database Engine instance [1]
    • does not persist after SQL Server shuts down 
      • created at restart using the specifications from the model database 
      • doesn’t need crash recovery
      • requires rollback capabilities
        • {scenario} when updating a row in a global temporary table and rolling back the transaction [12]
          • there's no way to undo this change unless was logged the ‘before value’ of the update [12]
          • there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]
        • {scenario} during an Insert operation
          • only redo information is needed
          • when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]
            • only redo information is needed
          • not all objects in TempDB are subject to logging [18]
    • {characteristic} critical database
      • ⇒ needs to be configured adequately 
      •  has the highest level of create and drop actions [5]
      • under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
      • workers use tempdb like any other database
        • any worker can issue I/O to and from tempdb as needed [5]
    • if the tempdb database cannot be created, SQL Server will not start [7]
      • {recommendation} start SQL Server by using the (-f) startup parameter
  • files
    • data file:
      • {default} dempdb.mdf
    • log file
      • {default} templog.ldf
  • {operation} changing the tempdb database
    • {step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
    • {step} stop and restart SQL Server
  • {operation} create file group
    • only one file group for data and one file group for logs are allowed.
    • {default} the number of files is set to 1
    • can be a high contention point for internal tracking structures
      •  often it's better to create multiple files for tempdb
    • multiple files can be created for each file group
      • {recommendation} set the number of files to match the number of CPUs that are configured for the instance
        • ⇐ if the number of logical processors is less than or equal to eight [1]
          • otherwise, use eight data files [1]
          • contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]
        • ⇐ it’s not imperative

      • {scenario}add a file per CPU (aka multiple files)
        • allows the file group manager to 
          • pick the next file 
            • there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]
          •  then use the target location to start locating free space [4]                
        • ⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)
          • allows each of the logical schedulers to loosely align with a file [4]
            • since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention  [4]        
        • internal latching and other activities achieve better separation (per file) 
          • ⇒ the workers don’t cause a resource contention on a single tempdb file
            • rare type of contention [5]
        • {misconception}
          • leads to new I/O threads per file [5]       
      • {scenario} add more data files 
        • may help to solve potential performance problems that are due to I/O operations. 
        • helps to avoid a latch contention on allocation pages
          •  manifested as a UP-latch)
      • {downside} having too many files 
        • increases the cost of file switching
        • requires more IAM pages
        • increases the manageability overhead. 
      • {recommendation} configure the size of the files to be equal
        • ⇐ to better use the allocation mechanism (proportional fill),
      • {recommendation} create tempdb files striped across fast disks 
    • the size of the tempdb affect the performance of the database 
    • {event} server restarts
      • the tempdb file size is reset to the configured value (the default is 8 MB).
    • {feature} auto grow 
      • temporary 
        • ⇐  unlike other types of databases)
      • when the tempdb grows, all transactional activity may come to an halt 
        • because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
        • {recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]
      • {recommendation} should only be used more for exceptions rather than as a strategy [12]
    • transactions
      • lose the durability attribute
      • can be rolled back
        • there is no need to REDO them because the contents of tempdb do not persist across server restarts
          • because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions 
        • most internal operations on tempdb do not generate log records because there is no need to roll back
    • {restriction} add filegroups [1]
    • {restriction} remove the primary filegroup, primary data file, or log file [1]
    • {restriction} renaming the database or primary filegroup [1]
    • {restriction} back up or restore the database [1]
    • {restriction} change collation [1]
      • the default collation is the server collation [1]
    • {restriction} change the database owner [1]
      • owned by sa
    • {restriction} create a database snapshot [1]
    • {restriction} drop the database [1]
    • {restriction} drop the quest user from the database [1]
    • {restriction} enable CDC
    • {restriction} participate in database mirroring [1]
    • {restriction} can only be configured in the simple recovery model
      • ⇒ the value can’t be changed
    • {restriction} auto shrink is not allowed for tempdb
      • database shrink and file shrink capabilities are limited
        •  because many of the hidden objects stored in tempdb cannot be moved by shrink operations
    • {restriction} the database CHECKSUM option cannot be enabled
    • {restriction} database snapshots cannot be created
    • {restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.
      • only offline checking for DBCC CHECKTABLE is performed
        • ⇒ TAB-S lock is needed
        • here are internal consistency checks that occur when tempdb is in use
          • if the checks fail, the user connection is broken and the tempdb space used by the connection is freed. 
    • {restriction} set the database to OFFLINE
    • {restriction} set the database or primary filegroup to READ_ONLY [1]
  • used by
    • {feature} query
    • {feature} temporary tables
    • {feature} table variables
    • {feature} table-valued functions
    • {feature} user-defined functions
    • {feature} online/offline index creation
    • {feature} triggers 
    • {feature} cursors
    • {feature} RCSI
    • {feature} MARS
    • {feature} DBCC CHECK
    • {feature} LOB parameters 
    • {feature} Service Broker and event notification
    • {feature} XML and LOB variable
    • {feature} query notifications
    • {feature} database mail
  • used to store
    • user objects
      • objects explicitly created
        • user objects that can be created in a user database can also be created in tempdb [1]
          • {limitation} there's no durability guarantee [1]
          • {limitation} dropped when the Database Engine instance restarts [1]
      • {type} global and local temporary tables 
        • correspond to ## and # tables and table variables created explicitly by the application [12]
          • REDO information is not logged
      • {type} indexes on global and local temporary tables
      • {type} table variables
      • {type} tables returned in table-valued functions
      • {type} cursors
      • improved caching for temporary objects
    • temp objects
      • only cached when none of the following conditions is violated: [12]
        • named constraints are not created
      • DDL statements that affect the table are not run after the temp table has been created
        • e.g. CREATE INDEX or CREATE STATISTICS statements
      • not created by using dynamic SQ
        • e.g. sp_executesql N'create table #t(a int)'
    • internal objects
      • objects created internally by SQL Server
        • each object uses a minimum of nine pages [1]
          • an IAM page and an eight-page extent [1]
      • {type} work tables
        • store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]
      • {type} work files
        • used for for hash join or hash aggregate operations [1]
      • {type} intermediate sort results
        • used for operations 
          • creating or rebuilding indexes
            • if SORT_IN_TEMPDB is specified
          • certain GROUP BY, ORDER BY
          • UNION queries
      • {restriction} applications cannot directly insert into or delete rows from them
      • {restriction} metadata is stored in memory
      • {restriction} metadata does not appear in system catalog views such as sys.all_objects.
      • {restriction} considered to be hidden objects
      • {restriction} updates to them do not generate log records
      • {restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated. 
      • each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
      • used
        • to store intermediate runs for sort
        • to store intermediate results for hash joins and hash aggregates.
        • to store XML variables or other LOB data type variables
          • e.g. text, image, ntext, varchar(max), varbinary(max), and all others
        • by queries that need a spool to store intermediate results
        • by keyset cursors to store the keys
        • by static cursors to store a query result
        • by Service Broker to store messages in transit
        • by INSTEAD OF triggers to store data for internal processing
        • by any feature that uses the above mentioned operations 
    • version stores
      • collections of data pages that hold the data rows that support row versioning [1]
      • {type} common version store
      • {type} online index build version store
  • {} storage
    • I/O characteristics
      • {requirement} read after rights
        • {def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]
      • {recommendation} writer ordering
        • {def} the ability of the subsystem to maintain the correct order of write operations [7]
      • {recommendation} torn I/O prevention
        • {def} the ability of the system to avoid splitting individual I/O requests [7]
      • {requirement} physical sector alignment and size
        • devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]
    • can be put on specialty systems
      • ⇐  they can’t be used for other databases
      • RAM disks
        • non-durable media
        • support double RAM cache
          • one in the buffer pool and one on the RAM disk
            •  directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]
        • give up RAM
          • implementations of RAM disks and RAM-based files caches
      • solid state
        • high speed subsystems
        • {recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]
    • sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]
  • {feature} logging optimization 
    • avoids logging the "after value" in certain log records in tempdb
      • when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file 
    • can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device
  • {feature} instant data file initialization 
    • isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
    • {benefit} minimizes overhead significantly when tempdb needs to auto grow. 
      • without this, auto grow could take a long time and lead to application timeout.
    • reduces the impact of database creation [5]
      • because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]
        • reduces the gain from using multiple threads during database creation. [5]
  • {feature} proportional fill optimization
    • reduces UP latch contention in tempdb
    • when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time
      •  accomplished by removing a latch that was taken during proportional fill.
  • {feature} deferred drop in tempdb
    • when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait
      • ⇒ faster response time to applications.
  • {feature} worktable caching 
    • {improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated
      • ⇐ in addition, the first nine pages for the work table are kept.
  • {feature} caches temporary objects
    • when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming
      •  this can cause contentions on tempdb system catalog tables and allocation pages 
Previous Post <<||>> Next Post

References
[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005
[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]
[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]
[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]
[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]
[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output
LOB - large object
MARS - Multiple Active Result Sets
NTFS - New Technology File System
RAM - Random-Access Memory
RCSI - Read Committed Snapshot Isolation

25 December 2024

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part II: Under the Hood) 🆕

The first tests performed with Microsoft Fabric's SQL databases (see previous post) revealed that with small exceptions many of the objects and administration scripts I built for SQL Server 2005 up to 2022 worked until now with minimal changes, though the tests made focused only on the standard basic functionality - the use of user-defined functions, stored procedures, views, respectively DMVs. As usual, the devil probably hides in details.

First of all, one can easily check current database's configuration settings via the sys.configurations:

-- database configuration values
SELECT cnf.configuration_id
, cnf.name
, cnf.description
, cnf.value 
, cnf.minimum 
, cnf.maximum 
, cnf.value_in_use 
, cnf.is_dynamic 
, cnf.is_advanced 
FROM sys.configurations cnf
WHERE value <> 0 
--WHERE name LIKE 'data%'
ORDER BY cnf.name 

One can look also at the database-scoped configuration via the sys.database_scoped_configurations, especially at the records whose value deviates from the default:

-- SQL database - database scoped configuration
SELECT name
, value
, value_for_secondary
, is_value_default 
FROM sys.database_scoped_configurations
WHERE is_value_default <>1 --search for deviations from default
--WHERE name=N'MAXDOP' -- search for individual values
ORDER BY name
Output:
name value value_for_secondary is_value_default
MAXDOP 8 False

Querying the sys.databases revealed that there seems to be only one system database available, namely the master database used to retrieve the definition of the system objects. In what concerns the various properties, it's probably important to check the compatibility level, the collation name, and other settings that apply:

-- database information
SELECT db.database_id
, db.[name] AS database_name
, db.recovery_model_desc 
, db.log_reuse_wait_desc 
, db.[compatibility_level] 
, db.page_verify_option_desc 
, db.collation_name
, db.user_access_desc
, db.is_change_feed_enabled
, db.is_data_lake_replication_enabled
FROM sys.databases AS db
Output:
database_id database_name recovery_model_desc log_reuse_wait_desc compatibility_level page_verify_option_desc collation_name user_access_desc is_change_feed_enabled is_data_lake_replication_enabled
1 master FULL NOTHING 160 CHECKSUM SQL_Latin1_General_CP1_CI_AS MULTI_USER False False
26 AdventureWorks01-... FULL NOTHING 160 CHECKSUM SQL_Latin1_General_CP1_CI_AS MULTI_USER True True

Unfortunately, it's not possible to change a database's collation, though other collations on individual columns are supported [2]. Compared to the standard SQL Server, it's not possible to change the compatibility level to a previous value. It will be interesting to see whether an SQL database will use always the latest version of SQL Server or whether future compatibility levels are supported in parallel.

There is no tempdb listed, though querying directly the objects from tempdb by using 3-part references allows to retrieve their content. For example, the following query retrieves the various database files available currently:

-- SQL database: tempdb files 
SELECT dbf.file_id
, dbf.name file_name
--, dbf.physical_name
, dsp.name file_group
--, type 
, dbf.type_desc file_type
--, dbf.growth growth_kb
, Cast(dbf.growth/128.0  as decimal(18,2)) growth_mb
--, dbf.is_percent_growth
--, dbf.max_size max_size_kb
, Cast(NullIf(dbf.max_size, -1)/128.0  as decimal(18,2)) max_size_mb
--, dbf.size file_size_kb
, Cast(dbf.size/128.0 as decimal(18,2)) file_size_mb
, dbf.state_desc 
, dbf.is_read_only 
FROM tempdb.sys.database_files dbf
     LEFT JOIN tempdb.sys.data_spaces dsp
       ON dbf.data_space_id = dsp.data_space_id
ORDER BY dbf.Name
Output:
file_id file_name file_group file_type growth_mb max_size_mb file_size_mb state_desc is_read_only
1 tempdev PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
11 tempdev10 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
12 tempdev11 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
13 tempdev12 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
14 tempdev13 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
15 tempdev14 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
16 tempdev15 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
17 tempdev16 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
3 tempdev2 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
4 tempdev3 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
5 tempdev4 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
6 tempdev5 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
7 tempdev6 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
8 tempdev7 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
9 tempdev8 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
10 tempdev9 PRIMARY ROWS 256.00 32768.00 16.00 ONLINE False
2 templog LOG 64.00 216256.00 16.00 ONLINE False

By removing the reference to tempdb from the query, one can get the same information for the current database. For the SalesLT were created only two database and one log file. 

There seems to be no msdb database, used primarily by the SQL Server Agent for scheduling alerts and jobs. This implies that the functionality based on it won't work and one needs to find alternatives!

Looking at the feature comparison from [2], the most important are the lack of support for always encrypted, auditing, bulk inserts, change data capture (CDC), elastic queries, full-text search, in-memory optimization, ledger, OPENROWSET. In addition it supports cross-database three-part name queries via the SQL analytics endpoint [2].

Until now, the most annoying fact is that in the web UI results are returned in different "pages", and thus makes it a bit more challenging to navigate the output. However, using a cursor to iterate through the results and saving the content to a table solves the problem (see link). The last query from the post was used to retrieve the system objects together with the number of records returned. 

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Frequently asked questions for SQL database in Microsoft Fabric (preview) [link]
[2] Microsoft Learn (2024) Features comparison: Azure SQL Database and SQL database in Microsoft Fabric (preview) [link]

01 February 2023

💎SQL Reloaded: Alternatives for Better Code Maintainability in SQL Server & Azure Synapse I

Introduction

Queries can become quite complex and with the increased complexity they'll be harder to read and/or maintain. Since the early days of SQL Server, views and table-valued user-defined functions (UDFs) are the standard ways of encapsulating logic for reuse in queries, allowing to minimize the duplication of logic. Then came the common table expressions (CTEs), which allow a further layer of structuring the code, independently whether a view or UDF was used. 

These are the main 3 options that can be combined in various ways to better structure the code. On the other side, also a temporary table or table variable could be used for the same purpose, though they have further implications.

To exemplify the various approaches, let's consider a simple query based on two tables from the AdventureWorks database. For the sake of simplicity, further business rules have been left out.

Inline Subqueries

-- products with open purchase orders
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN ( -- cumulated open purchase orders by product
		SELECT POL.ProductId 
		, SUM(POL.OrderQty) PurchQty
		FROM Purchasing.PurchaseOrderDetail POL
		WHERE OrderQty - (ReceivedQty - RejectedQty)>0
		GROUP BY POL.ProductId 
	) POL
	ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

As can be seen, the logic for the "Open purchase orders" result set is built within an inline subquery (aka inline view). As its logic becomes more complex, the simplest way to handle this is to move it into a CTE.

Common Table Expressions (CTEs)

A common table expression can be thought of as a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or CREATE VIEW statement [1]. Thus, the CTE can't be reused between queries.

The inline query is moved at the beginning within a WITH statement to which is given a proper name that allows easier identification later:

-- products with open purchase orders (common table expression)
WITH OpenPOs
AS (-- cumulated open purchase orders by product
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	GROUP BY POL.ProductId 
)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Thus, this allows us to rewrite the JOIN as if it were between two tables. Multiple CTEs can be used as well, with or without any dependencies between them. Moreover, CTEs allow building recursive queries (see example).

There is no performance gain or loss by using a CTE. It's important to know that the result set is not cached, therefore, if the same CTE is called multiple times (within a query), it will be also "executed" for the same number of times. Except the cases in which the database engine uses a spool operator to save intermediate query results for a CTE, there will be created no work table in tempdb for CTEs.

If the inline query needs to be reused in several queries, defining a view is a better alternative.

Views

A view is a database object used to encapsulate a query and that can be referenced from other queries much like a table. In fact, it's also referred as a "virtual table". A view can't be execute by itself (as stored procedures do. No data, only the definition of the view is stored, and the various actions that can be performed on database objects can be performed on views as well.

-- creating the view
CREATE VIEW dbo.vOpenPurchaseOrders
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- testing the view
SELECT top 10 *
FROM dbo.vOpenPurchaseOrders

Once the view is created, it can be called from any query:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.vOpenPurchaseOrders POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Besides the schema binding, there are no additional costs for using views. However, views have several limitations (see [2]). Moreover, it's not possible to use parameters with views, scenarios in which tabled-valued UDFs can help.

Indexed Views 

Starting with SQL Server 2015, it's possible to materialize the data in a view, storing the results of the view in a clustered index on the disk in same way a table with a clustered index is stored. This type of view is called an indexed view (aka materialized view, though the concept is used slightly different in Azure Synapse) and for long-running queries can provide considerable performance gains. In case the view contains a GROUP BY is present, its definition must contain COUNT_BIG(*) and must not contain HAVING.

-- dropping the view
--DROP VIEW IF EXISTS Purchasing.vOpenPOs

-- create view
CREATE VIEW Purchasing.vOpenPOs
WITH SCHEMABINDING
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
, COUNT_BIG(*) Count
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 
GO

--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX IDX_vOpenPOs
   ON Purchasing.vOpenPOs (ProductId);

--testing the view
SELECT top 100 *
FROM Purchasing.vOpenPOs

-- products with open purchase orders (indexed view)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN Purchasing.vOpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

When an indexed view is defined on a table, the query optimizer may use it to speed up the query execution even if it wasn't referenced in the query. Besides the restriction of the view to be deterministic, further limitations apply (see [6]).

Table-Valued Functions

A table-valued function is a user-defined function in which returns a table as a result, as opposed to a single data value, as scalar functions do.

Let's support that we need to restrict base the logic based on a time interval. We'd need then to provide the StartDate & EndDate as parameters. Compared with other UDFs table-valued functions, as their name implies, need to return a table:

-- creating the UDF function 
CREATE FUNCTION dbo.tvfOpenPurchaseOrdersByProduct( 
  @StartDate date 
, @EndDate date) 
RETURNS TABLE 
AS RETURN ( 
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	  AND POL.DueDate BETWEEN @StartDate AND @EndDate
	GROUP BY POL.ProductId 
)

-- testing the UDF
SELECT top 10 *
FROM dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31')

A table-valued function can be used as a "table with parameters" in JOINs:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31') POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

The parameters are optional, though in such cases using a view might still be a better idea. Table-valued functions used to have poor performance in the past compared with views and in certain scenarios they might still perform poorly. Their benefit resides in allowing to pass and use parameters in the logic, which can make them irreplaceable. Moreover, multi-statement table-valued functions can be built as well (see example)!

Notes:
1) When evaluating table-valued functions for usage consider their limitations as well (see [3])!
2) Scalar UDFs can be used to simplify the code as well, though they apply only to single values, therefore they are not considered in here!

Temporary Tables 

A temporary table is a base table that is stored and managed in tempdb as any other table. It exists only while the database session in which it was created is active. Therefore, it can be called multiple times, behaving much like a standard table:

-- create the temp table
CREATE TABLE dbo.#OpenPOs (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO #OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN dbo.#OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

-- drop the table (cleaning)
-- DROP TABLE IF EXISTS dbo.#OpenPOs;

Being created in the tempdb, system database shared by several databases, temporary table's performance relies on tempdb's configuration and workload. Moreover, the concurrent creation of temporary tables from many sessions can lead to tempdb metadata contention, as each session attempts updating metadata information in the system based tables.

Temporary tables are logged, which adds more burden on the database engine, however being able to create indexes on them and use statistics can help processing result sets more efficiently, especially when called multiple times. 

Also, a temporary table might be cached (see [1]) and not deleted when its purpose ends, which allows operations that drop and create the objects to execute very quickly and reduces page allocation contention.

Table Variables

A table variable is a variable of type TABLE and can be used in functions, stored procedures, and batches. The construct is similar to the temp table and is stored as well in the tempdb and cached under certain scenarios, however they are scoped to the batch or routine in which they are defined and destroyed after that. 

-- create the table variable
DECLARE @OpenPOs TABLE (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO @OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table variable)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN @OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Table variables don’t participate in transactions or locking, while DML operations done on them are not logged. There are also no statistics maintained and any data changes impacting the table variable will not cause recompilation. Thus, they are usually faster than temporary variables, especially when their size is small, though their performance depends also on how they are used. On the other side, for big result sets and/or when several calls are involved, a temporary table could prove to be more efficient. 

Important!!! Temporary tables and table variables are means of improving the performance of long-running queries. Being able to move pieces of logic around helps in maintaining the code and it also provides a logical structure of the steps, however they shouldn't be used if the performance gain is not the target! Overusing them as technique can considerably decrease the performance of tempdb, which can have impact in other areas!

Azure Synapse

Moving to Azure Synapse there are several important limitations in what concerns the above (see [4]). Even if some features are supported, further limitations might apply. What's important to note is that materialized views act like indexed view in standard SQL Server and that CETAS (Create External Table as SELECT) are available to import/export data to the supported file formats in Hadoop, Azure storage blob or Azure Data Lake Storage Gen2.

FeatureDedicatedServerlessSQL Server
CTEsYesYesYes (2015+)
Recursive CTEsNoNoYes (2015+)
ViewsYesYesYes
Indexed viewsNoNoYes
Materialized viewsYesNoNo
Table-valued functions (single statement)NoYesYes
Table-valued functions (multi-statement)NoNoYes
Scalar UDFs YesNoYes
TablesYesNoYes
Temporary tables (local)YesLimitedYes
Temporary tables (global)NoNoYes
Table variablesYesYesYes
CETASYesLimitedYes (2022+)

Notes:
1) CETAS have two important limitations in serverless SQL Pool
    a) once the data were exported to a file, they can't be overwritten via the same syntax;
    b) logic based on temporary tables can't be exported via pipelines.
2) Temporary tables can be used to replace cursors (see example).

Previous Post  <<||>>  Next Post

Resources:
[1] Microsoft Learn (2012) Capacity Planning for tempdb (link)
[2] Microsoft Learn (2023) CREATE View (link)
[3] Microsoft Learn (2023) CREATE Function (link)
[4] Microsoft Learn (2023) Transact-SQL features supported in Azure Synapse SQL (link)
[5] Redgate (2018) Choosing Between Table Variables and Temporary Tables (ST011, ST012), by Phil Factor (link)
[6] Microsoft Learn (2023) Create indexed views (link)
[7] Microsoft Learn (2023) CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) (link)
[8] Microsoft Learn (2023) CETAS with Synapse SQL (link)

28 January 2023

💎SQL Reloaded: Temporary Tables and Tempdb in Serverless SQL Pool

In SQL Server, temporary tables are stored in tempdb database and the front end (FE) SQL Server from a serverless SQL Server pool makes no exception from it, however there's no such database listed in the list of databases available. Watching today Brian Bønk's session on "Azure Synapse serverless - CETAS vs Views vs Parquet" at Data Toboggan winter edition 2023, the speaker pointed out that the tables available in tempdb database can be actually queried:

-- Azure Serverless SQL pool: tempdb tables
SELECT top 10 *
FROM tempdb.information_schema.tables
TABLE_CATALOGTABLE_SCHEMATABLE_NAMETABLE_TYPE
tempdbdbo#dm_continuous_copy_status_..._0000000029C7BASE TABLE
tempdbdbodiff_MonDmSbsConnectionsBASE TABLE
tempdbdbodiff_MonTceMasterKeysBASE TABLE
tempdbdbodiff_MonTceColEncryptionKeyBASE TABLE
tempdbdbodiff_MonAutomaticTuningStateBASE TABLE
tempdbdbodiff_MonDmTranActiveTransactionsBASE TABLE
tempdbdbodiff_MonTceEnclaveUsageInfoBASE TABLE
tempdbdbodiff_MonTceEnclaveColEncryptionKeyBASE TABLE
tempdbdbodiff_MonTceEnclaveMasterKeysBASE TABLE
tempdbdbodiff_MonTceColumnInfoBASE TABLE

Moreover, the content of the system tables available can be queried, even if some of the tables might have no data:
 
-- Azure Serverless SQL pool: checking a table's content
SELECT top 10 *
FROM tempdb.dbo.dmv_view_run_history

How about the temporary tables created? To check this, let's reuse a temp table created recently for listing the DMVs available in the serverlss SQL pool:

-- dropping the temp table
DROP TABLE IF EXISTS dbo.#views;

-- create the temp table
CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
)

-- inserting a few records
INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  AND obj.name LIKE 'dm_exec_requests%'

-- checking temp table's content
SELECT *
FROM dbo.#views
rankingview_name
1sys.dm_exec_requests
2sys.dm_exec_requests_history

Here's temp table's metadata:
 
-- checking temp table's metadata
SELECT *
FROM tempdb.information_schema.tables
WHERE table_name like '#views%'
TABLE_CATALOGTABLE_SCHEMATABLE_NAMETABLE_TYPE
tempdbdbo#views_____..._____00000000295BBASE TABLE

Of course, you can query also the tempdb.sys.all_objects: 

-- checking temp table's metadata
SELECT *
FROM tempdb.sys.all_objects
WHERE name like '#views%'

You can use now the table name returned by any of the two queries to call the table (just replace the name accordingly):

-- querying the table via tempdb (name will differ)
SELECT *
FROM tempdb.dbo.[#views______________________________________________________________________________________________________________00000000295B]
rankingview_name
1sys.dm_exec_requests
2sys.dm_exec_requests_history

The benefit of knowing that is neglectable, however the topic is interesting more for the ones who want to know how the pools in Azure Synapse work, what was kept or removed compared with the standard editions of SQL Server. 

Thus, another topic interesting for DBAs would be how many files are created for the tempdb, given that the database is used to store intermediate data for the various operations. Finding the optimal number of tempdb files and configuring them was and still is one of the main concerns when configuring an SQL Server instance for optimal performance.

The old query developed for standard SQL Server seems to work as well:
 
-- Azure Serverless SQL pool: tempdb files 
SELECT dbf.file_id
, dbf.name file_name
--, dbf.physical_name
, dsp.name file_group
--, type 
, dbf.type_desc file_type
--, dbf.growth growth_kb
, Cast(dbf.growth/128.0  as decimal(18,2)) growth_mb
, dbf.is_percent_growth
--, dbf.max_size max_size_kb
, Cast(NullIf(dbf.max_size, -1)/128.0  as decimal(18,2)) max_size_mb
--, dbf.size file_size_kb
, Cast(dbf.size/128.0 as decimal(18,2)) file_size_mb
, dbf.state_desc 
, dbf.is_read_only 
FROM tempdb.sys.database_files dbf
     LEFT JOIN tempdb.sys.data_spaces dsp
       ON dbf.data_space_id = dsp.data_space_id
ORDER BY dbf.Name
file_idfile_namefile_groupfile_typegrowth_mbis_percent_growthmax_size_mbfile_size_mbstate_descis_read_only
1tempdevPRIMARYROWS32010267016ONLINE0
3tempdev2PRIMARYROWS32010267016ONLINE0
4tempdev3PRIMARYROWS32010267016ONLINE0
5tempdev4PRIMARYROWS32010267016ONLINE0
2templogNULLLOG64016384255ONLINE0

So, there were created 4 data files of 16 MB each, with a fix growth of 32 MB, and the files can grow up to about 100 GB. What would be interesting to check is how the files grow under heavy workload, what kind of issues this raises, etc. At least in serverless SQL pools many of the views one used to use for troubleshooting aren't accessible. Maybe one doesn't have to go that far, given that the resources are managed automatically. 

Notes
(1) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1].

Happy coding!

Last updated: Oct-2024


References:
[1] Haripriya SB (2024) Do NOT drop #temp tables (link)
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.