SQL Troubles: table variable

Showing posts with label table variable. Show all posts

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

As shown in the previous two posts, Copilot helps to generate code even if this may involve multiple prompts and iterations. However, it works and probably new beginners can learn a lot from the code generated in the process. The current post is intended to provide some scenarios you can try in Copilot or any other prompt-based AI.

Disclaimer: The below code was generated by Copilot via prompts for demonstration purposes!

Use Case 1: Extracting Concatenated Emails

Sometimes one needs to work with distribution lists (list of emails delimited usually by ";"), following to extract the individual emails for the various purposes (e.g. validation, user's identification, etc.). The simplest solution is to use the STRING_SPLIT function introduced in SQL Server 2016, while previously the solutions involved parsing the sting within a loop as part of a stored procedure or user-defined function (UDF). When there's no adequate formatting, it can still be feasible to parse strings within loops.

Prompt:
Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the procedure.

Output:

-- stored procedure generated by Copilot
CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Create a table to store the individual emails
    CREATE TABLE #EmailTable (Email NVARCHAR(255));

    -- Split the email list and insert into the table
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO #EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END


    -- Return the result set
    SELECT Email FROM #EmailTable;

    -- Clean up
    DROP TABLE #EmailTable;
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

Prompt:
Create a user-defined function that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the function.

-- create the function
CREATE FUNCTION dbo.SplitEmails (@EmailList NVARCHAR(MAX))
RETURNS @EmailTable TABLE (Email NVARCHAR(255))
AS
BEGIN
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO @EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END

    RETURN;
END;


-- Test the SplitEmails function
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

SELECT * FROM dbo.SplitEmails(@EmailList);

From the two solutions probably the table-valued function (TVF) offers the most flexibility as it can be easier integrated in further logic. It would be interesting to also perform some performance texts. As long as the data volume is neglectable, an UDF is the way to go!

However, one can directly use the String_Split function. One can wonder why the function wasn't used in the first place?!

Prompt:

Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails using the String_Split functio. Write the code to test the procedure.

Output:

CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Use STRING_SPLIT to split the email list into individual emails
    SELECT value AS Email
    FROM STRING_SPLIT(@EmailList, ';');
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

This is probably a good example to show that one should start with a narrow question and expand it as needed. Though, this depends also on the problem to solve.

Use Case 2: Concatenate Emails from Table

It's often needed to concatenate the emails for a Customer or other entity to send standard emails to the parties associated with the respective emails. I wrote over the years several posts on the topic, long before the STRING_AGG function was introduced in SQL Server 2017. Even if the STRING_AGG function can be called directly without the need to use other scaffolding, it was used for its simplicity instead of doing a concatenation.

In the below examples one can see how a small change in the prompt can have an important impact on the solution generated.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table. Provide the code to test the function.

Output:

-- create the user defined function
CREATE FUNCTION dbo.ConcatenateEmails()
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM YourEmailTable;

    RETURN @EmailList;
END;

-- test the function
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmails() AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromSP(@TableName NVARCHAR(128))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    DECLARE @EmailList NVARCHAR(MAX);

    SET @SQL = 'SELECT @EmailList = STRING_AGG(Email, '';'') FROM ' + @TableName;

    EXEC sp_executesql @SQL, N'@EmailList NVARCHAR(MAX) OUTPUT', @EmailList OUTPUT;

    RETURN @EmailList;
END;

-- testing the code
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');


-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmailsFromSP('YourEmailTable') AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Running the create statements leads however to the following error message:

"Msg 557, Level 16, State 2, Line 3, Only functions and some extended stored procedures can be executed from within a function."

At least in the previous versions of SQL Server, sp_executesql couldn't be executed from a UDF. Even if it's possible to write such code, it will not work, as the example shows! It's to wonder what led to this solution.

Prompt:

Create a function in SQL Server that concatenates in a list the emails from a table variable provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromTableVar(@EmailTable TABLE (Email NVARCHAR(255)))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;

Executing the code results in the following error message:

"Msg 156, Level 15, State 1, Line 1, Incorrect syntax near the keyword 'TABLE'."

Checking the documentation, the following limitation should be observed [1]:

"Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine." [1]

The correct use should be observed in the next example.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table-valued parameter. Provide the code to test the function.

Output:

-- create a table type for the table-valued parameter:
CREATE TYPE EmailTableType AS TABLE 
(
    Email NVARCHAR(255)
)

-- create the table-valued function
CREATE FUNCTION dbo.ConcatenateEmailsFromTVP(@EmailTable EmailTableType READONLY)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;


-- Declare a table variable of the table type and insert some test data
DECLARE @EmailTable EmailTableType;

INSERT INTO @EmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmailsFromTVP function
SELECT dbo.ConcatenateEmailsFromTVP(@EmailTable) AS EmailList;

Final Thoughts

See a previous post for the uses of STRING_SPLIT and STRING_AGG complementary functions. If one doesn't know what is doing, it's easy to go on the wrong path(s) or eventually choose suboptimal solutions even when assisted by AI. If blunders can happen for simple cases, what will happen when the size of the solutions is hundreds and thousands times more complex? How will the blunders multiply?

As it can be seen from the not working examples, Copilot has more problems besides hallucinations. Even if semantically some solutions are viable, the code might break in contact with reality! Moreover, even if something works, one needs to weigh solutions in terms of performance and use specific techniques to address them! AI is more likely feasible in working with high-level programming languages like C++ or Java that hopefully offer fewer chances for mistakes.

Once the code templates are generated, the code can be extended as needed. Frankly, even if the code is quite basic, the result is impressive considered through the amount of work involved to generate it. One can but wonder whether the code was taken from a source, part of the training, respectively part of the AI heuristics. It can probably be a combination of all.

Even if Copilot generated the template, the developers still need to be able to implement it in the broader solution, which sometimes might be more challenging than expected, especially when the details matter. The devil is in the details after all! Developers must also recognize which solution is more appropriate in terms of flexibility, maintenance, performance or whatever criteria may apply. I was wondering whether one can change the code to a degree that one can't recognize anymore that it was written by Copilot. Any ideas?

It's to be expected that in the near future such tools will be able to provide more complex solutions. I will not say that programming is dead, as some proclaim, though that will shift its focus in the area of prompt engineering. Somebody, and I mean here the programmer or other technical person, must be able to evaluate critically and qualitatively the results, respectively identify the blunders and other types of issues resulting in the process!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server 2022: Use table-valued parameters [link]

Acronyms:
DML - Data Manipulation Language
TVF - Table-Valued Function
UDF - User-Defined Function

01 February 2023

💎SQL Reloaded: Alternatives for Better Code Maintainability in SQL Server & Azure Synapse I

Introduction

Queries can become quite complex and with the increased complexity they'll be harder to read and/or maintain. Since the early days of SQL Server, views and table-valued user-defined functions (UDFs) are the standard ways of encapsulating logic for reuse in queries, allowing to minimize the duplication of logic. Then came the common table expressions (CTEs), which allow a further layer of structuring the code, independently whether a view or UDF was used.

These are the main 3 options that can be combined in various ways to better structure the code. On the other side, also a temporary table or table variable could be used for the same purpose, though they have further implications.

To exemplify the various approaches, let's consider a simple query based on two tables from the AdventureWorks database. For the sake of simplicity, further business rules have been left out.

Inline Subqueries

-- products with open purchase orders
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN ( -- cumulated open purchase orders by product
		SELECT POL.ProductId 
		, SUM(POL.OrderQty) PurchQty
		FROM Purchasing.PurchaseOrderDetail POL
		WHERE OrderQty - (ReceivedQty - RejectedQty)>0
		GROUP BY POL.ProductId 
	) POL
	ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

As can be seen, the logic for the "Open purchase orders" result set is built within an inline subquery (aka inline view). As its logic becomes more complex, the simplest way to handle this is to move it into a CTE.

Common Table Expressions (CTEs)

A common table expression can be thought of as a temporary result set defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or CREATE VIEW statement [1]. Thus, the CTE can't be reused between queries.

The inline query is moved at the beginning within a WITH statement to which is given a proper name that allows easier identification later:

-- products with open purchase orders (common table expression)
WITH OpenPOs
AS (-- cumulated open purchase orders by product
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	GROUP BY POL.ProductId 
)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Thus, this allows us to rewrite the JOIN as if it were between two tables. Multiple CTEs can be used as well, with or without any dependencies between them. Moreover, CTEs allow building recursive queries (see example).

There is no performance gain or loss by using a CTE. It's important to know that the result set is not cached, therefore, if the same CTE is called multiple times (within a query), it will be also "executed" for the same number of times. Except the cases in which the database engine uses a spool operator to save intermediate query results for a CTE, there will be created no work table in tempdb for CTEs.

If the inline query needs to be reused in several queries, defining a view is a better alternative.

Views

A view is a database object used to encapsulate a query and that can be referenced from other queries much like a table. In fact, it's also referred as a "virtual table". A view can't be execute by itself (as stored procedures do. No data, only the definition of the view is stored, and the various actions that can be performed on database objects can be performed on views as well.

-- creating the view
CREATE VIEW dbo.vOpenPurchaseOrders
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- testing the view
SELECT top 10 *
FROM dbo.vOpenPurchaseOrders

Once the view is created, it can be called from any query:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.vOpenPurchaseOrders POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Besides the schema binding, there are no additional costs for using views. However, views have several limitations (see [2]). Moreover, it's not possible to use parameters with views, scenarios in which tabled-valued UDFs can help.

Indexed Views

Starting with SQL Server 2015, it's possible to materialize the data in a view, storing the results of the view in a clustered index on the disk in same way a table with a clustered index is stored. This type of view is called an indexed view (aka materialized view, though the concept is used slightly different in Azure Synapse) and for long-running queries can provide considerable performance gains. In case the view contains a GROUP BY is present, its definition must contain COUNT_BIG(*) and must not contain HAVING.

-- dropping the view
--DROP VIEW IF EXISTS Purchasing.vOpenPOs

-- create view
CREATE VIEW Purchasing.vOpenPOs
WITH SCHEMABINDING
AS
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
, COUNT_BIG(*) Count
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 
GO

--Create an index on the view.
CREATE UNIQUE CLUSTERED INDEX IDX_vOpenPOs
   ON Purchasing.vOpenPOs (ProductId);

--testing the view
SELECT top 100 *
FROM Purchasing.vOpenPOs

-- products with open purchase orders (indexed view)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN Purchasing.vOpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

When an indexed view is defined on a table, the query optimizer may use it to speed up the query execution even if it wasn't referenced in the query. Besides the restriction of the view to be deterministic, further limitations apply (see [6]).

Table-Valued Functions

A table-valued function is a user-defined function in which returns a table as a result, as opposed to a single data value, as scalar functions do.

Let's support that we need to restrict base the logic based on a time interval. We'd need then to provide the StartDate & EndDate as parameters. Compared with other UDFs table-valued functions, as their name implies, need to return a table:

-- creating the UDF function 
CREATE FUNCTION dbo.tvfOpenPurchaseOrdersByProduct( 
  @StartDate date 
, @EndDate date) 
RETURNS TABLE 
AS RETURN ( 
	SELECT POL.ProductId 
	, SUM(POL.OrderQty) PurchQty
	FROM Purchasing.PurchaseOrderDetail POL
	WHERE OrderQty - (ReceivedQty - RejectedQty)>0
	  AND POL.DueDate BETWEEN @StartDate AND @EndDate
	GROUP BY POL.ProductId 
)

-- testing the UDF
SELECT top 10 *
FROM dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31')

A table-valued function can be used as a "table with parameters" in JOINs:

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM Production.Product ITM
     JOIN dbo.tvfOpenPurchaseOrdersByProduct('2014-01-01', '2014-12-31') POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

The parameters are optional, though in such cases using a view might still be a better idea. Table-valued functions used to have poor performance in the past compared with views and in certain scenarios they might still perform poorly. Their benefit resides in allowing to pass and use parameters in the logic, which can make them irreplaceable. Moreover, multi-statement table-valued functions can be built as well (see example)!

Notes:
1) When evaluating table-valued functions for usage consider their limitations as well (see [3])!
2) Scalar UDFs can be used to simplify the code as well, though they apply only to single values, therefore they are not considered in here!

Temporary Tables

A temporary table is a base table that is stored and managed in tempdb as any other table. It exists only while the database session in which it was created is active. Therefore, it can be called multiple times, behaving much like a standard table:

-- create the temp table
CREATE TABLE dbo.#OpenPOs (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO #OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table-valued function)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN dbo.#OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

-- drop the table (cleaning)
-- DROP TABLE IF EXISTS dbo.#OpenPOs;

Being created in the tempdb, system database shared by several databases, temporary table's performance relies on tempdb's configuration and workload. Moreover, the concurrent creation of temporary tables from many sessions can lead to tempdb metadata contention, as each session attempts updating metadata information in the system based tables.

Temporary tables are logged, which adds more burden on the database engine, however being able to create indexes on them and use statistics can help processing result sets more efficiently, especially when called multiple times.

Also, a temporary table might be cached (see [1]) and not deleted when its purpose ends, which allows operations that drop and create the objects to execute very quickly and reduces page allocation contention.

Table Variables

A table variable is a variable of type TABLE and can be used in functions, stored procedures, and batches. The construct is similar to the temp table and is stored as well in the tempdb and cached under certain scenarios, however they are scoped to the batch or routine in which they are defined and destroyed after that.

-- create the table variable
DECLARE @OpenPOs TABLE (
  ProductId int NOT NULL
, PurchQty decimal(8,2) NOT NULL
)

-- insert the cumulated purchase orders
INSERT INTO @OpenPOs
SELECT POL.ProductId 
, SUM(POL.OrderQty) PurchQty
FROM Purchasing.PurchaseOrderDetail POL
WHERE OrderQty - (ReceivedQty - RejectedQty)>0
GROUP BY POL.ProductId 

-- products with open purchase orders (table variable)
SELECT ITM.ProductNumber
, ITM.Name
, POL.PurchQty
FROM [Production].[Product] ITM
     JOIN @OpenPOs POL
	   ON ITM.ProductId = POL.ProductId
ORDER BY ITM.ProductNumber

Table variables don’t participate in transactions or locking, while DML operations done on them are not logged. There are also no statistics maintained and any data changes impacting the table variable will not cause recompilation. Thus, they are usually faster than temporary variables, especially when their size is small, though their performance depends also on how they are used. On the other side, for big result sets and/or when several calls are involved, a temporary table could prove to be more efficient.

Important!!! Temporary tables and table variables are means of improving the performance of long-running queries. Being able to move pieces of logic around helps in maintaining the code and it also provides a logical structure of the steps, however they shouldn't be used if the performance gain is not the target! Overusing them as technique can considerably decrease the performance of tempdb, which can have impact in other areas!

Azure Synapse

Moving to Azure Synapse there are several important limitations in what concerns the above (see [4]). Even if some features are supported, further limitations might apply. What's important to note is that materialized views act like indexed view in standard SQL Server and that CETAS (Create External Table as SELECT) are available to import/export data to the supported file formats in Hadoop, Azure storage blob or Azure Data Lake Storage Gen2.

Feature	Dedicated	Serverless	SQL Server
CTEs	Yes	Yes	Yes (2015+)
Recursive CTEs	No	No	Yes (2015+)
Views	Yes	Yes	Yes
Indexed views	No	No	Yes
Materialized views	Yes	No	No
Table-valued functions (single statement)	No	Yes	Yes
Table-valued functions (multi-statement)	No	No	Yes
Scalar UDFs	Yes	No	Yes
Tables	Yes	No	Yes
Temporary tables (local)	Yes	Limited	Yes
Temporary tables (global)	No	No	Yes
Table variables	Yes	Yes	Yes
CETAS	Yes	Limited	Yes (2022+)

Notes:
1) CETAS have two important limitations in serverless SQL Pool
a) once the data were exported to a file, they can't be overwritten via the same syntax;
b) logic based on temporary tables can't be exported via pipelines.
2) Temporary tables can be used to replace cursors (see example).

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Learn (2012) Capacity Planning for tempdb (link)
[2] Microsoft Learn (2023) CREATE View (link)
[3] Microsoft Learn (2023) CREATE Function (link)
[4] Microsoft Learn (2023) Transact-SQL features supported in Azure Synapse SQL (link)
[5] Redgate (2018) Choosing Between Table Variables and Temporary Tables (ST011, ST012), by Phil Factor (link)
[6] Microsoft Learn (2023) Create indexed views (link)
[7] Microsoft Learn (2023) CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) (link)
[8] Microsoft Learn (2023) CETAS with Synapse SQL (link)

18 October 2022

💎SQL Reloaded: Successive Price Increases/Discounts via Windowing Functions and CTEs

I was trying today to solve a problem that apparently requires recursive common table expressions, though they are not (yet) available in Azure Synapse serverless SQL pool. The problem can be summarized in the below table definition, in which given a set of Products with an initial Sales price, is needed to apply Price Increases successively for each Cycle. The cumulated increase is simulated in the last column for each line.

Unfortunately, there is no SQL Server windowing function that allows multiplying incrementally the values of a column (similar as the running total works). However, there’s a mathematical trick that can be used to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see Solution 1), and which frankly is more elegant than applying CTEs (see Solution 2).

-- create table with test data
SELECT *
INTO dbo.ItemPrices
FROM (VALUES ('ID001', 1000, 1, 1.02, '1.02')
, ('ID001', 1000, 2, 1.03, '1.02*1.03')
, ('ID001', 1000, 3, 1.03, '1.02*1.03*1.03')
, ('ID001', 1000, 4, 1.04, '1.02*1.03*1.03*1.04')
, ('ID002', 100, 1, 1.02, '1.02')
, ('ID002', 100, 2, 1.03, '1.02*1.03')
, ('ID002', 100, 3, 1.04, '1.02*1.03*1.04')
, ('ID002', 100, 4, 1.05, '1.02*1.03*1.04*1.05')
) DAT (ItemId, SalesPrice, Cycle, PriceIncrease, CumulatedIncrease)

-- reviewing the data
SELECT *
FROM dbo.ItemPrices

-- Solution 1: new sales prices with log & exp
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) CumulatedIncrease
, SalesPrice * EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) NewSalesPrice
FROM dbo.ItemPrices

-- Solution 2: new sales prices with recursive CTE
;WITH CTE 
AS (
-- initial record
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, cast(ITP.PriceIncrease as decimal(38,6)) CumulatedIncrease
FROM dbo.ItemPrices ITP
WHERE ITP.Cycle = 1
UNION ALL
-- recursice part
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, Cast(ITP.PriceIncrease * ITO.CumulatedIncrease as decimal(38,6))  CumulatedIncrease
FROM dbo.ItemPrices ITP
    JOIN CTE ITO
	  ON ITP.ItemId = ITO.ItemId
	 AND ITP.Cycle-1 = ITO.Cycle
)
-- final result
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, CumulatedIncrease
, SalesPrice * CumulatedIncrease NewSalesPrice
FROM CTE
ORDER BY ItemId
, Cycle


-- validating the cumulated price increases (only last ones)
SELECT 1.02*1.03*1.03*1.04 
, 1.02*1.03*1.04*1.05

-- cleaning up
DROP TABLE IF EXISTS dbo.ItemPrices

Notes:
1. The logarithm in SQL Server’s implementation works only with positive numbers!
2. For simplification I transformed percentages (e.g. 1%) in values that are easier to multitply with (e.g. 1.01). The solution can be easily modified to consider discounts.
3. When CTEs are not available, one is forced to return to the methods used in SQL Server 2000 (I've been there) and thus use temporary tables or table variables with loops. Moreover, the logic can be encapsulated in multi-statement table-valued functions (see example), unfortunately, another feature not (yet) supported by serverless SQL pools.
4. Unfortunately, STRING_AGG, which concatenates values across rows, works only with a GROUP BY clause. Anyway, its result is useless without the availability of a Eval function in SQL (see example), however the Expr function available in data flows could be used as workaround.
4. Even if I thought about the use of logarithms for transforming the product into a sum, I initially ignored the idea, thinking that the solution would be too complex to implement. So, the credit goes to another blogpost. Kudos!
5. The queries work also in a SQL databases in Microsoft Fabric. Just replace the Sales with SalesLT schema (see post, respectively GitHub repository with the changed code).

6. With SQL Server 2025, Azure SQL Database and SQL databases was intrduced the Product function.

Happy coding!

13 August 2010

💎SQL Reloaded: Temporary Tables vs. Table Variables and TempDB

Yesterday, I started to read Ken Henderson’s book, SQL Server 2005 Practical Troubleshooting: The Database Engine, diving directly into tempdb topic (Chapter 9, Tempdb issues). He mentions that metadata are created in system tables when a temporary table is created (see p.415). This means that when a temporary table is created, a record must be created in tempdb’s sys.tables and sys.columns system table, the respective records being deleted when the table is dropped. As I never looked at how the metadata of a temporary table look like, I thought is the case to do something in this direction, and here’s the code created for this purpose:

-- creating the temporary tables 
CREATE TABLE #temp ( 
   id int NOT NULL 
, value nvarchar(50) NOT NULL) 

-- retrieving the metadata 
SELECT t.name table_name 
, s.name column_name 
, t.type  
, t.type_desc  
, t.create_date  
FROM tempdb.sys.tables t 
    JOIN tempdb.sys.columns s 
      ON t.object_id = s.object_id 
WHERE t.name LIKE '%temp%' 

-- dropping the temporary table 
-- DROP TABLE #temp  -- see the 2nd note!

temp vs variable tables - temporary example

Note:
By changing the width of table_name column could be seen that object’s name corresponding to the temporary tables is a combination from table’s name and, according to K. Henderson, the number designating the connection that owns the table.

If the temporary table is stored in tempdb and metadata are stored about it, what’s happening with a temporary table? Here’s the answer:

-- creating the table variable 
DECLARE @temp TABLE( 
  id int NOT NULL  
, value nvarchar(50) NOT NULL) 


-- retrieving the metadata 
SELECT t.name table_name 
, s.name column_name 
, t.type  
, t.type_desc  
, t.create_date  
FROM tempdb.sys.tables t 
     JOIN tempdb.sys.columns s 
      ON t.object_id = s.object_id 
WHERE t.name LIKE '%#%' 
   AND DateDiff(ss, t.create_date, GetDate()) BETWEEN -2 AND 2

temp vs variable tables - table variable example

As can be seen I had to put a little more effort in order to see a table variable’s metadata. As there is no name that could be used in order to identify the table, as object’s name is stored as a hex number, I had to restrain the list of tables by using the timestamp. Excepting the name, the metadata stored about the two types of tables are identical for the same table definition. Actually their definition is similar with the one of a “standard” table:

-- creating a "standard" table 
CREATE TABLE temp( 
  id int NOT NULL 
, value nvarchar(50) NOT NULL) 

-- retrieving the metadata 
SELECT t.name table_name 
, s.name column_name 
, t.type  
, t.type_desc  
, t.create_date  
FROM sys.tables t 
     JOIN sys.columns s 
      ON t.object_id = s.object_id 
WHERE t.name LIKE '%temp%' 

-- dropping the table 
-- DROP TABLE temp -- see the 2nd note!

temp vs variable tables - standard table example

Notes:
(1) For exemplification I used a restrained list of attributes, when comparing the various table’s metadata could be used instead a SELECT * statement. The above examples reflect also the differences in declaring the three types of tables.

(2) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1].

Last updated: Oct-2024

References:
[1] Haripriya SB (2024) Do NOT drop #temp tables (link)

SQL Troubles

Pages

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

01 February 2023

💎SQL Reloaded: Alternatives for Better Code Maintainability in SQL Server & Azure Synapse I

18 October 2022

💎SQL Reloaded: Successive Price Increases/Discounts via Windowing Functions and CTEs

13 August 2010

💎SQL Reloaded: Temporary Tables vs. Table Variables and TempDB

About Me