Introduction
Queries can become quite complex and with the increased complexity they'll be harder to read and/or maintain. Since the early days of SQL Server, views and table-valued user-defined functions (UDFs) are the standard ways of encapsulating logic for reuse in queries, allowing to minimize the duplication of logic. Then came the common table expressions (CTEs), which allow a further layer of structuring the code, independently whether a view or UDF was used.
These are the main 3 options that can be combined in various ways to better structure the code. On the other side, also a temporary table or table variable could be used for the same purpose, though they have further implications.
To exemplify the various approaches, let's consider a simple query based on two tables from the AdventureWorks database. For the sake of simplicity, further business rules have been left out.
Inline Subqueries
-- products with open purchase orders
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
JOIN ( -- cumulated open purchase orders by product
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
) POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
As can be seen, the logic for the "Open purchase orders" result set is built within an inline subquery (aka inline view). As its logic becomes more complex, the simplest way to handle this is to move it into a CTE.
Common Table Expressions (CTEs)
A common table expression can be thought of as a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or CREATE VIEW statement [1]. Thus, the CTE can't be reused between queries.
The inline query is moved at the beginning within a WITH statement to which is given a proper name that allows easier identification later:
-- products with open purchase orders (common table expression)
WITH OpenPOs
AS (-- cumulated open purchase orders by product
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
JOIN OpenPOs POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
Thus, this allows us to rewrite the JOIN as if it were between two tables. Multiple CTEs can be used as well, with or without any dependencies between them. Moreover, CTEs allow building recursive queries (see example).
There is no performance gain or loss by using a CTE. It's important to know that the result set is not cached, therefore, if the same CTE is called multiple times (within a query), it will be also "executed" for the same number of times. Except the cases in which the database engine uses a spool operator to save intermediate query results for a CTE, there will be created no work table in tempdb for CTEs.
If the inline query needs to be reused in several queries, defining a view is a better alternative.
Views
A view is a database object used to encapsulate a query and that can be referenced from other queries much like a table. In fact, it's also referred as a "virtual table". A view can't be execute by itself (as stored procedures do. No data, only the definition of the view is stored, and the various actions that can be performed on database objects can be performed on views as well.
-- creating the view
CREATE VIEW dbo.vOpenPurchaseOrders
AS
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
-- testing the view
SELECT top 10 *
FROM dbo.vOpenPurchaseOrders
Once the view is created, it can be called from any query:
-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
JOIN dbo.vOpenPurchaseOrders POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
Besides the schema binding, there are no additional costs for using views. However, views have several limitations (see [2]). Moreover, it's not possible to use parameters with views, scenarios in which tabled-valued UDFs can help.
Indexed Views
Starting with SQL Server 2015, it's possible to materialize the data in a view, storing the results of the view in a clustered index on the disk in same way a table with a clustered index is stored. This type of view is called an indexed view (aka materialized view, though the concept is used slightly different in Azure Synapse) and for long-running queries can provide considerable performance gains. In case the view contains a GROUP BY is present, its definition must contain COUNT_BIG(*) and must not contain HAVING.
-- dropping the view
--DROP VIEW IF EXISTS Purchasing.vOpenPOs
-- create view
CREATE VIEW Purchasing.vOpenPOs
WITH SCHEMABINDING
AS
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
, COUNT_BIG(*) Count
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
GO
--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX IDX_vOpenPOs
ON Purchasing.vOpenPOs (ProductId);
--testing the view
SELECT top 100 *
FROM Purchasing.vOpenPOs
-- products with open purchase orders (indexed view)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
JOIN Purchasing.vOpenPOs POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
When an indexed view is defined on a table, the query optimizer may use it to speed up the query execution even if it wasn't referenced in the query. Besides the restriction of the view to be deterministic, further limitations apply (see [6]).
Table-Valued Functions
A table-valued function is a user-defined function in which returns a table as a result, as opposed to a single data value, as scalar functions do.
Let's support that we need to restrict base the logic based on a time interval. We'd need then to provide the StartDate & EndDate as parameters. Compared with other UDFs table-valued functions, as their name implies, need to return a table:
-- creating the UDF function
CREATE FUNCTION dbo.tvfOpenPurchaseOrdersByProduct(
@StartDate date
, @EndDate date)
RETURNS TABLE
AS RETURN (
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
AND POL.DueDate BETWEEN @StartDate AND @EndDate
GROUP BY POL.ProductId
)
-- testing the UDF
SELECT top 10 *
FROM dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31')
A table-valued function can be used as a "table with parameters" in JOINs:
-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
JOIN dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31') POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
The parameters are optional, though in such cases using a view might still be a better idea. Table-valued functions used to have poor performance in the past compared with views and in certain scenarios they might still perform poorly. Their benefit resides in allowing to pass and use parameters in the logic, which can make them irreplaceable. Moreover, multi-statement table-valued functions can be built as well (see example)!
Notes:
1) When evaluating table-valued functions for usage consider their limitations as well (see [3])!
2) Scalar UDFs can be used to simplify the code as well, though they apply only to single values, therefore they are not considered in here!
Temporary Tables
A temporary table is a base table that is stored and managed in tempdb as any other table. It exists only while the database session in which it was created is active. Therefore, it can be called multiple times, behaving much like a standard table:
-- create the temp table
CREATE TABLE dbo.#OpenPOs (
ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)
-- insert the cumulated purchase orders
INSERT INTO #OpenPOs
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
JOIN dbo.#OpenPOs POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
-- drop the table (cleaning)
-- DROP TABLE IF EXISTS dbo.#OpenPOs;
Being created in the tempdb, system database shared by several databases, temporary table's performance relies on tempdb's configuration and workload. Moreover, the concurrent creation of temporary tables from many sessions can lead to tempdb metadata contention, as each session attempts updating metadata information in the system based tables.
Temporary tables are logged, which adds more burden on the database engine, however being able to create indexes on them and use statistics can help processing result sets more efficiently, especially when called multiple times.
Also, a temporary table might be cached (see [1]) and not deleted when its purpose ends, which allows operations that drop and create the objects to execute very quickly and reduces page allocation contention.
Table Variables
A table variable is a variable of type TABLE and can be used in functions, stored procedures, and batches. The construct is similar to the temp table and is stored as well in the tempdb and cached under certain scenarios, however they are scoped to the batch or routine in which they are defined and destroyed after that.
-- create the table variable
DECLARE @OpenPOs TABLE (
ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)
-- insert the cumulated purchase orders
INSERT INTO @OpenPOs
SELECT POL.ProductId
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId
-- products with open purchase orders (table variable)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
JOIN @OpenPOs POL
ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber
Table variables don’t participate in transactions or locking, while DML operations done on them are not logged. There are also no statistics maintained and any data changes impacting the table variable will not cause recompilation. Thus, they are usually faster than temporary variables, especially when their size is small, though their performance depends also on how they are used. On the other side, for big result sets and/or when several calls are involved, a temporary table could prove to be more efficient.
Important!!! Temporary tables and table variables are means of improving the performance of long-running queries. Being able to move pieces of logic around helps in maintaining the code and it also provides a logical structure of the steps, however they shouldn't be used if the performance gain is not the target! Overusing them as technique can considerably decrease the performance of tempdb, which can have impact in other areas!
Azure Synapse
Moving to Azure Synapse there are several important limitations in what concerns the above (see [4]). Even if some features are supported, further limitations might apply. What's important to note is that materialized views act like indexed view in standard SQL Server and that CETAS (Create External Table as SELECT) are available to import/export data to the supported file formats in Hadoop, Azure storage blob or Azure Data Lake Storage Gen2.
Feature | Dedicated | Serverless | SQL Server |
CTEs | Yes | Yes | Yes (2015+) |
Recursive CTEs | No | No | Yes (2015+) |
Views | Yes | Yes | Yes |
Indexed views | No | No | Yes |
Materialized views | Yes | No | No |
Table-valued functions (single statement) | No | Yes | Yes |
Table-valued functions (multi-statement) | No | No | Yes |
Scalar UDFs | Yes | No | Yes |
Tables | Yes | No | Yes |
Temporary tables (local) | Yes | Limited | Yes |
Temporary tables (global) | No | No | Yes |
Table variables | Yes | Yes | Yes |
CETAS | Yes | Limited | Yes (2022+) |
Notes:
1) CETAS have two important limitations in serverless SQL Pool
a) once the data were exported to a file, they can't be overwritten via the same syntax;
b) logic based on temporary tables can't be exported via pipelines.
2) Temporary tables can be used to replace cursors (see example).
Previous Post
<<||>>
Next Post
Resources:
[1] Microsoft Learn (2012) Capacity Planning for tempdb (link)
[2] Microsoft Learn (2023) CREATE View (link)
[3] Microsoft Learn (2023) CREATE Function (link)
[4] Microsoft Learn (2023) Transact-SQL features supported in Azure Synapse SQL (link)
[5] Redgate (2018) Choosing Between Table Variables and Temporary Tables (ST011, ST012), by Phil Factor (link)
[6] Microsoft Learn (2023) Create indexed views (link)
[7] Microsoft Learn (2023) CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) (link)
[8] Microsoft Learn (2023) CETAS with Synapse SQL (link)