SQL Troubles: tempdb

Showing posts with label tempdb. Show all posts

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

{def} system database available as a global resource to all users connected to a Database Engine instance [1]

does not persist after SQL Server shuts down

created at restart using the specifications from the model database
doesn’t need crash recovery
requires rollback capabilities

{scenario} when updating a row in a global temporary table and rolling back the transaction [12]

there's no way to undo this change unless was logged the ‘before value’ of the update [12]
there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]

{scenario} during an Insert operation

only redo information is needed
when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]

only redo information is needed

not all objects in TempDB are subject to logging [18]

{characteristic} critical database

⇒ needs to be configured adequately
has the highest level of create and drop actions [5]
under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
workers use tempdb like any other database

any worker can issue I/O to and from tempdb as needed [5]

if the tempdb database cannot be created, SQL Server will not start [7]

{recommendation} start SQL Server by using the (-f) startup parameter

files

data file:

{default} dempdb.mdf

log file

{default} templog.ldf

{operation} changing the tempdb database

{step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
{step} stop and restart SQL Server

{operation} create file group

only one file group for data and one file group for logs are allowed.
{default} the number of files is set to 1
can be a high contention point for internal tracking structures

often it's better to create multiple files for tempdb

multiple files can be created for each file group

{recommendation} set the number of files to match the number of CPUs that are configured for the instance

⇐ if the number of logical processors is less than or equal to eight [1]

otherwise, use eight data files [1]
contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]

⇐ it’s not imperative

{scenario}add a file per CPU (aka multiple files)

allows the file group manager to

pick the next file

there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]

then use the target location to start locating free space [4]

⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)

allows each of the logical schedulers to loosely align with a file [4]

since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention [4]

internal latching and other activities achieve better separation (per file)

⇒ the workers don’t cause a resource contention on a single tempdb file

rare type of contention [5]

{misconception}

leads to new I/O threads per file [5]

{scenario} add more data files

may help to solve potential performance problems that are due to I/O operations.
helps to avoid a latch contention on allocation pages

⇐ manifested as a UP-latch)

{downside} having too many files

increases the cost of file switching
requires more IAM pages
increases the manageability overhead.

{recommendation} configure the size of the files to be equal

⇐ to better use the allocation mechanism (proportional fill),

{recommendation} create tempdb files striped across fast disks

the size of the tempdb affect the performance of the database
{event} server restarts

the tempdb file size is reset to the configured value (the default is 8 MB).

{feature} auto grow

temporary

⇐ unlike other types of databases)

when the tempdb grows, all transactional activity may come to an halt

because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
{recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]

{recommendation} should only be used more for exceptions rather than as a strategy [12]

transactions

lose the durability attribute
can be rolled back

there is no need to REDO them because the contents of tempdb do not persist across server restarts

because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions

most internal operations on tempdb do not generate log records because there is no need to roll back

{restriction} add filegroups [1]
{restriction} remove the primary filegroup, primary data file, or log file [1]
{restriction} renaming the database or primary filegroup [1]
{restriction} back up or restore the database [1]
{restriction} change collation [1]

the default collation is the server collation [1]

{restriction} change the database owner [1]

owned by sa

{restriction} create a database snapshot [1]
{restriction} drop the database [1]
{restriction} drop the quest user from the database [1]
{restriction} enable CDC
{restriction} participate in database mirroring [1]
{restriction} can only be configured in the simple recovery model

⇒ the value can’t be changed

{restriction} auto shrink is not allowed for tempdb

database shrink and file shrink capabilities are limited

⇐ because many of the hidden objects stored in tempdb cannot be moved by shrink operations

{restriction} the database CHECKSUM option cannot be enabled
{restriction} database snapshots cannot be created
{restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.

only offline checking for DBCC CHECKTABLE is performed

⇒ TAB-S lock is needed
here are internal consistency checks that occur when tempdb is in use

if the checks fail, the user connection is broken and the tempdb space used by the connection is freed.

{restriction} set the database to OFFLINE
{restriction} set the database or primary filegroup to READ_ONLY [1]

used by

{feature} query
{feature} temporary tables
{feature} table variables
{feature} table-valued functions
{feature} user-defined functions
{feature} online/offline index creation
{feature} triggers
{feature} cursors
{feature} RCSI
{feature} MARS
{feature} DBCC CHECK
{feature} LOB parameters
{feature} Service Broker and event notification
{feature} XML and LOB variable
{feature} query notifications
{feature} database mail

used to store

user objects

objects explicitly created

user objects that can be created in a user database can also be created in tempdb [1]

{limitation} there's no durability guarantee [1]
{limitation} dropped when the Database Engine instance restarts [1]

{type} global and local temporary tables

correspond to ## and # tables and table variables created explicitly by the application [12]

REDO information is not logged

{type} indexes on global and local temporary tables
{type} table variables
{type} tables returned in table-valued functions
{type} cursors
improved caching for temporary objects

temp objects

only cached when none of the following conditions is violated: [12]

named constraints are not created

DDL statements that affect the table are not run after the temp table has been created

e.g. CREATE INDEX or CREATE STATISTICS statements

not created by using dynamic SQ

e.g. sp_executesql N'create table #t(a int)'

internal objects

objects created internally by SQL Server

each object uses a minimum of nine pages [1]

an IAM page and an eight-page extent [1]

{type} work tables

store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]

{type} work files

used for for hash join or hash aggregate operations [1]

{type} intermediate sort results

used for operations

creating or rebuilding indexes

if SORT_IN_TEMPDB is specified

certain GROUP BY, ORDER BY
UNION queries

{restriction} applications cannot directly insert into or delete rows from them
{restriction} metadata is stored in memory
{restriction} metadata does not appear in system catalog views such as sys.all_objects.
{restriction} considered to be hidden objects
{restriction} updates to them do not generate log records
{restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated.
each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
used

to store intermediate runs for sort
to store intermediate results for hash joins and hash aggregates.
to store XML variables or other LOB data type variables

e.g. text, image, ntext, varchar(max), varbinary(max), and all others

by queries that need a spool to store intermediate results
by keyset cursors to store the keys
by static cursors to store a query result
by Service Broker to store messages in transit
by INSTEAD OF triggers to store data for internal processing
by any feature that uses the above mentioned operations

version stores

collections of data pages that hold the data rows that support row versioning [1]
{type} common version store
{type} online index build version store

{} storage

I/O characteristics

{requirement} read after rights

{def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]

{recommendation} writer ordering

{def} the ability of the subsystem to maintain the correct order of write operations [7]

{recommendation} torn I/O prevention

{def} the ability of the system to avoid splitting individual I/O requests [7]

{requirement} physical sector alignment and size

devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]

can be put on specialty systems

⇐ they can’t be used for other databases
RAM disks

non-durable media
support double RAM cache

one in the buffer pool and one on the RAM disk

⇐ directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]

give up RAM

implementations of RAM disks and RAM-based files caches

solid state

high speed subsystems
{recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]

sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]

{feature} logging optimization

avoids logging the "after value" in certain log records in tempdb

when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file

can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device

{feature} instant data file initialization

isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
{benefit} minimizes overhead significantly when tempdb needs to auto grow.

without this, auto grow could take a long time and lead to application timeout.

reduces the impact of database creation [5]

because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]

reduces the gain from using multiple threads during database creation. [5]

{feature} proportional fill optimization

reduces UP latch contention in tempdb
when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time

⇐ accomplished by removing a latch that was taken during proportional fill.

{feature} deferred drop in tempdb

when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait

⇒ faster response time to applications.

{feature} worktable caching

{improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated

⇐ in addition, the first nine pages for the work table are kept.

{feature} caches temporary objects

when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming

⇐ this can cause contentions on tempdb system catalog tables and allocation pages

Previous Post <<||>> Next Post

References

[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005

[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]

[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]

[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]

[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]

[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]

[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output

LOB - large object
MARS - Multiple Active Result Sets

NTFS - New Technology File System

RAM - Random-Access Memory

RCSI - Read Committed Snapshot Isolation

25 December 2024

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part II: Under the Hood) 🆕

The first tests performed with Microsoft Fabric's SQL databases (see previous post) revealed that with small exceptions many of the objects and administration scripts I built for SQL Server 2005 up to 2022 worked until now with minimal changes, though the tests made focused only on the standard basic functionality - the use of user-defined functions, stored procedures, views, respectively DMVs. As usual, the devil probably hides in details.

First of all, one can easily check current database's configuration settings via the sys.configurations:

-- database configuration values
SELECT cnf.configuration_id
, cnf.name
, cnf.description
, cnf.value 
, cnf.minimum 
, cnf.maximum 
, cnf.value_in_use 
, cnf.is_dynamic 
, cnf.is_advanced 
FROM sys.configurations cnf
WHERE value <> 0 
--WHERE name LIKE 'data%'
ORDER BY cnf.name

One can look also at the database-scoped configuration via the sys.database_scoped_configurations, especially at the records whose value deviates from the default:

-- SQL database - database scoped configuration
SELECT name
, value
, value_for_secondary
, is_value_default 
FROM sys.database_scoped_configurations
WHERE is_value_default <>1 --search for deviations from default
--WHERE name=N'MAXDOP' -- search for individual values
ORDER BY name

Output:

name	value	value_for_secondary	is_value_default
MAXDOP	8		False

Querying the sys.databases revealed that there seems to be only one system database available, namely the master database used to retrieve the definition of the system objects. In what concerns the various properties, it's probably important to check the compatibility level, the collation name, and other settings that apply:

-- database information
SELECT db.database_id
, db.[name] AS database_name
, db.recovery_model_desc 
, db.log_reuse_wait_desc 
, db.[compatibility_level] 
, db.page_verify_option_desc 
, db.collation_name
, db.user_access_desc
, db.is_change_feed_enabled
, db.is_data_lake_replication_enabled
FROM sys.databases AS db

Output:

database_id	database_name	recovery_model_desc	log_reuse_wait_desc	compatibility_level	page_verify_option_desc	collation_name	user_access_desc	is_change_feed_enabled	is_data_lake_replication_enabled
1	master	FULL	NOTHING	160	CHECKSUM	SQL_Latin1_General_CP1_CI_AS	MULTI_USER	False	False
26	AdventureWorks01-...	FULL	NOTHING	160	CHECKSUM	SQL_Latin1_General_CP1_CI_AS	MULTI_USER	True	True

Unfortunately, it's not possible to change a database's collation, though other collations on individual columns are supported [2]. Compared to the standard SQL Server, it's not possible to change the compatibility level to a previous value. It will be interesting to see whether an SQL database will use always the latest version of SQL Server or whether future compatibility levels are supported in parallel.

There is no tempdb listed, though querying directly the objects from tempdb by using 3-part references allows to retrieve their content. For example, the following query retrieves the various database files available currently:

-- SQL database: tempdb files 
SELECT dbf.file_id
, dbf.name file_name
--, dbf.physical_name
, dsp.name file_group
--, type 
, dbf.type_desc file_type
--, dbf.growth growth_kb
, Cast(dbf.growth/128.0  as decimal(18,2)) growth_mb
--, dbf.is_percent_growth
--, dbf.max_size max_size_kb
, Cast(NullIf(dbf.max_size, -1)/128.0  as decimal(18,2)) max_size_mb
--, dbf.size file_size_kb
, Cast(dbf.size/128.0 as decimal(18,2)) file_size_mb
, dbf.state_desc 
, dbf.is_read_only 
FROM tempdb.sys.database_files dbf
     LEFT JOIN tempdb.sys.data_spaces dsp
       ON dbf.data_space_id = dsp.data_space_id
ORDER BY dbf.Name

Output:

file_id	file_name	file_group	file_type	growth_mb	max_size_mb	file_size_mb	state_desc	is_read_only
1	tempdev	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
11	tempdev10	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
12	tempdev11	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
13	tempdev12	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
14	tempdev13	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
15	tempdev14	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
16	tempdev15	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
17	tempdev16	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
3	tempdev2	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
4	tempdev3	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
5	tempdev4	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
6	tempdev5	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
7	tempdev6	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
8	tempdev7	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
9	tempdev8	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
10	tempdev9	PRIMARY	ROWS	256.00	32768.00	16.00	ONLINE	False
2	templog		LOG	64.00	216256.00	16.00	ONLINE	False

By removing the reference to tempdb from the query, one can get the same information for the current database. For the SalesLT were created only two database and one log file.

There seems to be no msdb database, used primarily by the SQL Server Agent for scheduling alerts and jobs. This implies that the functionality based on it won't work and one needs to find alternatives!

Looking at the feature comparison from [2], the most important are the lack of support for always encrypted, auditing, bulk inserts, change data capture (CDC), elastic queries, full-text search, in-memory optimization, ledger, OPENROWSET. In addition it supports cross-database three-part name queries via the SQL analytics endpoint [2].

Until now, the most annoying fact is that in the web UI results are returned in different "pages", and thus makes it a bit more challenging to navigate the output. However, using a cursor to iterate through the results and saving the content to a table solves the problem (see link). The last query from the post was used to retrieve the system objects together with the number of records returned.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Frequently asked questions for SQL database in Microsoft Fabric (preview) [link]
[2] Microsoft Learn (2024) Features comparison: Azure SQL Database and SQL database in Microsoft Fabric (preview) [link]

01 February 2023

💎SQL Reloaded: Alternatives for Better Code Maintainability in SQL Server & Azure Synapse I

Introduction

Queries can become quite complex and with the increased complexity they'll be harder to read and/or maintain. Since the early days of SQL Server, views and table-valued user-defined functions (UDFs) are the standard ways of encapsulating logic for reuse in queries, allowing to minimize the duplication of logic. Then came the common table expressions (CTEs), which allow a further layer of structuring the code, independently whether a view or UDF was used.

These are the main 3 options that can be combined in various ways to better structure the code. On the other side, also a temporary table or table variable could be used for the same purpose, though they have further implications.

To exemplify the various approaches, let's consider a simple query based on two tables from the AdventureWorks database. For the sake of simplicity, further business rules have been left out.

Inline Subqueries

-- products with open purchase orders
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN ( -- cumulated open purchase orders by product
		SELECT POL.ProductId 
		, SUM(POL.OrderQty) PurchQty
		FROM Purchasing.PurchaseOrderDetail POL
		WHERE OrderQty - (ReceivedQty - RejectedQty)>0
		GROUP BY POL.ProductId 
	) POL
	ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

As can be seen, the logic for the "Open purchase orders" result set is built within an inline subquery (aka inline view). As its logic becomes more complex, the simplest way to handle this is to move it into a CTE.

Common Table Expressions (CTEs)

A common table expression can be thought of as a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or CREATE VIEW statement [1]. Thus, the CTE can't be reused between queries.

The inline query is moved at the beginning within a WITH statement to which is given a proper name that allows easier identification later:

-- products with open purchase orders (common table expression)
WITH OpenPOs
AS (-- cumulated open purchase orders by product
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	GROUP BY POL.ProductId 
)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Thus, this allows us to rewrite the JOIN as if it were between two tables. Multiple CTEs can be used as well, with or without any dependencies between them. Moreover, CTEs allow building recursive queries (see example).

There is no performance gain or loss by using a CTE. It's important to know that the result set is not cached, therefore, if the same CTE is called multiple times (within a query), it will be also "executed" for the same number of times. Except the cases in which the database engine uses a spool operator to save intermediate query results for a CTE, there will be created no work table in tempdb for CTEs.

If the inline query needs to be reused in several queries, defining a view is a better alternative.

Views

A view is a database object used to encapsulate a query and that can be referenced from other queries much like a table. In fact, it's also referred as a "virtual table". A view can't be execute by itself (as stored procedures do. No data, only the definition of the view is stored, and the various actions that can be performed on database objects can be performed on views as well.

-- creating the view
CREATE VIEW dbo.vOpenPurchaseOrders
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- testing the view
SELECT top 10 *
FROM dbo.vOpenPurchaseOrders

Once the view is created, it can be called from any query:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.vOpenPurchaseOrders POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Besides the schema binding, there are no additional costs for using views. However, views have several limitations (see [2]). Moreover, it's not possible to use parameters with views, scenarios in which tabled-valued UDFs can help.

Indexed Views

Starting with SQL Server 2015, it's possible to materialize the data in a view, storing the results of the view in a clustered index on the disk in same way a table with a clustered index is stored. This type of view is called an indexed view (aka materialized view, though the concept is used slightly different in Azure Synapse) and for long-running queries can provide considerable performance gains. In case the view contains a GROUP BY is present, its definition must contain COUNT_BIG(*) and must not contain HAVING.

-- dropping the view
--DROP VIEW IF EXISTS Purchasing.vOpenPOs

-- create view
CREATE VIEW Purchasing.vOpenPOs
WITH SCHEMABINDING
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
, COUNT_BIG(*) Count
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 
GO

--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX IDX_vOpenPOs
   ON Purchasing.vOpenPOs (ProductId);

--testing the view
SELECT top 100 *
FROM Purchasing.vOpenPOs

-- products with open purchase orders (indexed view)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN Purchasing.vOpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

When an indexed view is defined on a table, the query optimizer may use it to speed up the query execution even if it wasn't referenced in the query. Besides the restriction of the view to be deterministic, further limitations apply (see [6]).

Table-Valued Functions

A table-valued function is a user-defined function in which returns a table as a result, as opposed to a single data value, as scalar functions do.

Let's support that we need to restrict base the logic based on a time interval. We'd need then to provide the StartDate & EndDate as parameters. Compared with other UDFs table-valued functions, as their name implies, need to return a table:

-- creating the UDF function 
CREATE FUNCTION dbo.tvfOpenPurchaseOrdersByProduct( 
  @StartDate date 
, @EndDate date) 
RETURNS TABLE 
AS RETURN ( 
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	  AND POL.DueDate BETWEEN @StartDate AND @EndDate
	GROUP BY POL.ProductId 
)

-- testing the UDF
SELECT top 10 *
FROM dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31')

A table-valued function can be used as a "table with parameters" in JOINs:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31') POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

The parameters are optional, though in such cases using a view might still be a better idea. Table-valued functions used to have poor performance in the past compared with views and in certain scenarios they might still perform poorly. Their benefit resides in allowing to pass and use parameters in the logic, which can make them irreplaceable. Moreover, multi-statement table-valued functions can be built as well (see example)!

Notes:
1) When evaluating table-valued functions for usage consider their limitations as well (see [3])!
2) Scalar UDFs can be used to simplify the code as well, though they apply only to single values, therefore they are not considered in here!

Temporary Tables

A temporary table is a base table that is stored and managed in tempdb as any other table. It exists only while the database session in which it was created is active. Therefore, it can be called multiple times, behaving much like a standard table:

-- create the temp table
CREATE TABLE dbo.#OpenPOs (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO #OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN dbo.#OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

-- drop the table (cleaning)
-- DROP TABLE IF EXISTS dbo.#OpenPOs;

Being created in the tempdb, system database shared by several databases, temporary table's performance relies on tempdb's configuration and workload. Moreover, the concurrent creation of temporary tables from many sessions can lead to tempdb metadata contention, as each session attempts updating metadata information in the system based tables.

Temporary tables are logged, which adds more burden on the database engine, however being able to create indexes on them and use statistics can help processing result sets more efficiently, especially when called multiple times.

Also, a temporary table might be cached (see [1]) and not deleted when its purpose ends, which allows operations that drop and create the objects to execute very quickly and reduces page allocation contention.

Table Variables

A table variable is a variable of type TABLE and can be used in functions, stored procedures, and batches. The construct is similar to the temp table and is stored as well in the tempdb and cached under certain scenarios, however they are scoped to the batch or routine in which they are defined and destroyed after that.

-- create the table variable
DECLARE @OpenPOs TABLE (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO @OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table variable)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN @OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Table variables don’t participate in transactions or locking, while DML operations done on them are not logged. There are also no statistics maintained and any data changes impacting the table variable will not cause recompilation. Thus, they are usually faster than temporary variables, especially when their size is small, though their performance depends also on how they are used. On the other side, for big result sets and/or when several calls are involved, a temporary table could prove to be more efficient.

Important!!! Temporary tables and table variables are means of improving the performance of long-running queries. Being able to move pieces of logic around helps in maintaining the code and it also provides a logical structure of the steps, however they shouldn't be used if the performance gain is not the target! Overusing them as technique can considerably decrease the performance of tempdb, which can have impact in other areas!

Azure Synapse

Moving to Azure Synapse there are several important limitations in what concerns the above (see [4]). Even if some features are supported, further limitations might apply. What's important to note is that materialized views act like indexed view in standard SQL Server and that CETAS (Create External Table as SELECT) are available to import/export data to the supported file formats in Hadoop, Azure storage blob or Azure Data Lake Storage Gen2.

Feature	Dedicated	Serverless	SQL Server
CTEs	Yes	Yes	Yes (2015+)
Recursive CTEs	No	No	Yes (2015+)
Views	Yes	Yes	Yes
Indexed views	No	No	Yes
Materialized views	Yes	No	No
Table-valued functions (single statement)	No	Yes	Yes
Table-valued functions (multi-statement)	No	No	Yes
Scalar UDFs	Yes	No	Yes
Tables	Yes	No	Yes
Temporary tables (local)	Yes	Limited	Yes
Temporary tables (global)	No	No	Yes
Table variables	Yes	Yes	Yes
CETAS	Yes	Limited	Yes (2022+)

Notes:
1) CETAS have two important limitations in serverless SQL Pool
a) once the data were exported to a file, they can't be overwritten via the same syntax;
b) logic based on temporary tables can't be exported via pipelines.
2) Temporary tables can be used to replace cursors (see example).

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Learn (2012) Capacity Planning for tempdb (link)
[2] Microsoft Learn (2023) CREATE View (link)
[3] Microsoft Learn (2023) CREATE Function (link)
[4] Microsoft Learn (2023) Transact-SQL features supported in Azure Synapse SQL (link)
[5] Redgate (2018) Choosing Between Table Variables and Temporary Tables (ST011, ST012), by Phil Factor (link)
[6] Microsoft Learn (2023) Create indexed views (link)
[7] Microsoft Learn (2023) CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) (link)
[8] Microsoft Learn (2023) CETAS with Synapse SQL (link)

28 January 2023

💎SQL Reloaded: Temporary Tables and Tempdb in Serverless SQL Pool

In SQL Server, temporary tables are stored in tempdb database and the front end (FE) SQL Server from a serverless SQL Server pool makes no exception from it, however there's no such database listed in the list of databases available. Watching today Brian Bønk's session on "Azure Synapse serverless - CETAS vs Views vs Parquet" at Data Toboggan winter edition 2023, the speaker pointed out that the tables available in tempdb database can be actually queried:

-- Azure Serverless SQL pool: tempdb tables
SELECT top 10 *
FROM tempdb.information_schema.tables

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	TABLE_TYPE
tempdb	dbo	#dm_continuous_copy_status_..._0000000029C7	BASE TABLE
tempdb	dbo	diff_MonDmSbsConnections	BASE TABLE
tempdb	dbo	diff_MonTceMasterKeys	BASE TABLE
tempdb	dbo	diff_MonTceColEncryptionKey	BASE TABLE
tempdb	dbo	diff_MonAutomaticTuningState	BASE TABLE
tempdb	dbo	diff_MonDmTranActiveTransactions	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveUsageInfo	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveColEncryptionKey	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveMasterKeys	BASE TABLE
tempdb	dbo	diff_MonTceColumnInfo	BASE TABLE

Moreover, the content of the system tables available can be queried, even if some of the tables might have no data:

-- Azure Serverless SQL pool: checking a table's content
SELECT top 10 *
FROM tempdb.dbo.dmv_view_run_history

How about the temporary tables created? To check this, let's reuse a temp table created recently for listing the DMVs available in the serverlss SQL pool:

-- dropping the temp table
DROP TABLE IF EXISTS dbo.#views;

-- create the temp table
CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
)

-- inserting a few records
INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  AND obj.name LIKE 'dm_exec_requests%'

-- checking temp table's content
SELECT *
FROM dbo.#views

ranking	view_name
1	sys.dm_exec_requests
2	sys.dm_exec_requests_history

Here's temp table's metadata:

-- checking temp table's metadata
SELECT *
FROM tempdb.information_schema.tables
WHERE table_name like '#views%'

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	TABLE_TYPE
tempdb	dbo	#views_____..._____00000000295B	BASE TABLE

Of course, you can query also the tempdb.sys.all_objects:

-- checking temp table's metadata
SELECT *
FROM tempdb.sys.all_objects
WHERE name like '#views%'

You can use now the table name returned by any of the two queries to call the table (just replace the name accordingly):

-- querying the table via tempdb (name will differ)
SELECT *
FROM tempdb.dbo.[#views______________________________________________________________________________________________________________00000000295B]

ranking	view_name
1	sys.dm_exec_requests
2	sys.dm_exec_requests_history

The benefit of knowing that is neglectable, however the topic is interesting more for the ones who want to know how the pools in Azure Synapse work, what was kept or removed compared with the standard editions of SQL Server.

Thus, another topic interesting for DBAs would be how many files are created for the tempdb, given that the database is used to store intermediate data for the various operations. Finding the optimal number of tempdb files and configuring them was and still is one of the main concerns when configuring an SQL Server instance for optimal performance.

The old query developed for standard SQL Server seems to work as well:

-- Azure Serverless SQL pool: tempdb files 
SELECT dbf.file_id
, dbf.name file_name
--, dbf.physical_name
, dsp.name file_group
--, type 
, dbf.type_desc file_type
--, dbf.growth growth_kb
, Cast(dbf.growth/128.0  as decimal(18,2)) growth_mb
, dbf.is_percent_growth
--, dbf.max_size max_size_kb
, Cast(NullIf(dbf.max_size, -1)/128.0  as decimal(18,2)) max_size_mb
--, dbf.size file_size_kb
, Cast(dbf.size/128.0 as decimal(18,2)) file_size_mb
, dbf.state_desc 
, dbf.is_read_only 
FROM tempdb.sys.database_files dbf
     LEFT JOIN tempdb.sys.data_spaces dsp
       ON dbf.data_space_id = dsp.data_space_id
ORDER BY dbf.Name

file_id	file_name	file_group	file_type	growth_mb	is_percent_growth	max_size_mb	file_size_mb	state_desc	is_read_only
1	tempdev	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
3	tempdev2	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
4	tempdev3	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
5	tempdev4	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
2	templog	NULL	LOG	64	0	16384	255	ONLINE	0

So, there were created 4 data files of 16 MB each, with a fix growth of 32 MB, and the files can grow up to about 100 GB. What would be interesting to check is how the files grow under heavy workload, what kind of issues this raises, etc. At least in serverless SQL pools many of the views one used to use for troubleshooting aren't accessible. Maybe one doesn't have to go that far, given that the resources are managed automatically.

Notes:

(1) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1].

Happy coding!

Last updated: Oct-2024

Previous Post <<||>> Next Post

References:
[1] Haripriya SB (2024) Do NOT drop #temp tables (link)

SQL Troubles

Pages

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

25 December 2024

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part II: Under the Hood) 🆕

01 February 2023

💎SQL Reloaded: Alternatives for Better Code Maintainability in SQL Server & Azure Synapse I

28 January 2023

💎SQL Reloaded: Temporary Tables and Tempdb in Serverless SQL Pool

About Me