Showing posts with label STRING_AGG. Show all posts
Showing posts with label STRING_AGG. Show all posts

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

As shown in the previous two posts, Copilot helps to generate code even if this may involve multiple prompts and iterations. However, it works and probably new beginners can learn a lot from the code generated in the process. The current post is intended to provide some scenarios you can try in Copilot or any other prompt-based AI.

Disclaimer: The below code was generated by Copilot via prompts for demonstration purposes!

Use Case 1: Extracting Concatenated Emails 

Sometimes one needs to work with distribution lists (list of emails delimited usually by ";"), following to extract the individual emails for the various purposes (e.g. validation, user's identification, etc.). The simplest solution is to use the STRING_SPLIT function introduced in SQL Server 2016, while previously the solutions involved parsing the sting within a loop as part of a stored procedure or user-defined function (UDF). When there's no adequate formatting, it can still be feasible to parse strings within loops. 

Prompt:
Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the procedure. 

Output:

-- stored procedure generated by Copilot
CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Create a table to store the individual emails
    CREATE TABLE #EmailTable (Email NVARCHAR(255));

    -- Split the email list and insert into the table
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO #EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END


    -- Return the result set
    SELECT Email FROM #EmailTable;

    -- Clean up
    DROP TABLE #EmailTable;
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

Prompt:
Create a user-defined function that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the function.

-- create the function
CREATE FUNCTION dbo.SplitEmails (@EmailList NVARCHAR(MAX))
RETURNS @EmailTable TABLE (Email NVARCHAR(255))
AS
BEGIN
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO @EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END

    RETURN;
END;


-- Test the SplitEmails function
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

SELECT * FROM dbo.SplitEmails(@EmailList);

From the two solutions probably the table-valued function (TVF) offers the most flexibility as it can be easier integrated in further logic. It would be interesting to also perform some performance texts. As long as the data volume is neglectable, an UDF is the way to go! 

However, one can directly use the String_Split  function. One can wonder why the function wasn't used in the first place?!

Prompt:

Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails using the String_Split functio. Write the code to test the procedure. 

Output:

CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Use STRING_SPLIT to split the email list into individual emails
    SELECT value AS Email
    FROM STRING_SPLIT(@EmailList, ';');
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

This is probably a good example to show that one should start with a narrow question and expand it as needed. Though, this depends also on the problem to solve.

Use Case 2: Concatenate Emails from Table

It's often needed to concatenate the emails for a Customer or other entity to send standard emails to the parties associated with the respective emails. I wrote over the years several posts on the topic, long before the STRING_AGG function was introduced in SQL Server 2017. Even if the STRING_AGG function can be called directly without the need to use other scaffolding, it was used for its simplicity instead of doing a concatenation.

In the below examples one can see how a small change in the prompt can have an important impact on the solution generated.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table. Provide the code to test the function.

Output:

-- create the user defined function
CREATE FUNCTION dbo.ConcatenateEmails()
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM YourEmailTable;

    RETURN @EmailList;
END;

-- test the function
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmails() AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table provided as parameter. Provide the code to test the function.

Output: 

CREATE FUNCTION dbo.ConcatenateEmailsFromSP(@TableName NVARCHAR(128))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    DECLARE @EmailList NVARCHAR(MAX);

    SET @SQL = 'SELECT @EmailList = STRING_AGG(Email, '';'') FROM ' + @TableName;

    EXEC sp_executesql @SQL, N'@EmailList NVARCHAR(MAX) OUTPUT', @EmailList OUTPUT;

    RETURN @EmailList;
END;

-- testing the code
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');


-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmailsFromSP('YourEmailTable') AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Running the create statements leads however to the following error message:

"Msg 557, Level 16, State 2, Line 3, Only functions and some extended stored procedures can be executed from within a function."

At least in the previous versions of SQL Server, sp_executesql couldn't be executed from a UDF. Even if it's possible to write such code, it will not work, as the example shows! It's to wonder what led to this solution. 

Prompt:

Create a function in SQL Server that concatenates in a list the emails from a table variable provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromTableVar(@EmailTable TABLE (Email NVARCHAR(255)))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;

Executing the code results in the following error message:

"Msg 156, Level 15, State 1, Line 1, Incorrect syntax near the keyword 'TABLE'."

Checking the documentation, the following limitation should be observed [1]:  

"Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine." [1]

The correct use should be observed in the next example.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table-valued parameter. Provide the code to test the function.

Output:

-- create a table type for the table-valued parameter:
CREATE TYPE EmailTableType AS TABLE 
(
    Email NVARCHAR(255)
)

-- create the table-valued function
CREATE FUNCTION dbo.ConcatenateEmailsFromTVP(@EmailTable EmailTableType READONLY)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;


-- Declare a table variable of the table type and insert some test data
DECLARE @EmailTable EmailTableType;

INSERT INTO @EmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmailsFromTVP function
SELECT dbo.ConcatenateEmailsFromTVP(@EmailTable) AS EmailList;

Final Thoughts

See a previous post for the uses of STRING_SPLIT and STRING_AGG complementary functions. If one doesn't know what is doing, it's easy to go on the wrong path(s) or eventually choose suboptimal solutions even when assisted by AI. If blunders can happen for simple cases, what will happen when the size of the solutions is hundreds and thousands times more complex? How will the blunders multiply?

As it can be seen from the not working examples, Copilot has more problems besides hallucinations. Even if semantically some solutions are viable, the code might break in contact with reality! Moreover, even if something works, one needs to weigh solutions in terms of performance and use specific techniques to address them! AI is more likely feasible in working with high-level programming languages like C++ or Java that hopefully offer fewer chances for mistakes.

Once the code templates are generated, the code can be extended as needed. Frankly, even if the code is quite basic, the result is impressive considered through the amount of work involved to generate it. One can but wonder whether the code was taken from a source, part of the training, respectively part of the AI heuristics. It can probably be a combination of all.

Even if Copilot generated the template, the developers still need to be able to implement it in the broader solution, which sometimes might be more challenging than expected, especially when the details matter. The devil is in the details after all! Developers must also recognize which solution is more appropriate in terms of flexibility, maintenance, performance or whatever criteria may apply. I was wondering whether one can change the code to a degree that one can't recognize anymore that it was written by Copilot. Any ideas?

It's to be expected that in the near future such tools will be able to provide more complex solutions. I will not say that programming is dead, as some proclaim, though that will shift its focus in the area of prompt engineering. Somebody, and I mean here the programmer or other technical person, must be able to evaluate critically and qualitatively the results, respectively identify the blunders and other types of issues resulting in the process!

Happy coding!

Previous Post <<||>> Next Post 

References:
[1] Microsoft Learn (2024) SQL Server 2022: Use table-valued parameters [link]

Acronyms:
DML - Data Manipulation Language
TVF - Table-Valued Function
UDF - User-Defined Function

05 November 2022

💎SQL Reloaded: STRING_AGG and STRING_SPLIT at Work, and a Bit of Pivoting

Working with strings across records was for long a nightmare for SQL developers until Microsoft introduced STRING_SPLIT in SQL Server 2016, respectively STRING_AGG in SQL Server 2017.  Previously, one was forced to write procedural language or use workarounds until SQL Server 2015, when recursive CTEs (common table expressions), Ranking and PIVOT were introduced, which allowed handling many scenarios. 

Microsoft provides several examples for the usage of STRING_SPLIT and STRING_AGG functions based on AdventureWorks database, though let's look at another example based on the same database. 

Let's say we want to show the concatenated Contacts for a store, result which can now easily be obtained by using the STRING_AGG:

-- concatenating names per store via STRING_AGG (SQL Server 2017+)
SELECT BusinessEntityID
, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
FROM Sales.vStoreWithContacts
GROUP BY BusinessEntityID
HAVING count(*)>1

Observe that is needed to use a GROUP BY to show one record per Store. Unfortunately, there isn't yet a window function available for the same. 

The inverse operation can be performed with the help of STRING_SPLIT table-valued function (TVF). (If you wonder why is needed a TVF, it is because the initial record needs to be multiplied by the generated output.)

-- reversing the concatenation (SQL Server 2017+)
WITH CTE
AS (
	-- concatenating names per store
	SELECT BusinessEntityID
	, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
	FROM Sales.vStoreWithContacts
	GROUP BY BusinessEntityID
	HAVING count(*)>1
) 
SELECT CTE.BusinessEntityID
, DAT.Value
, DAT.Ordinal 
FROM CTE
    CROSS APPLY STRING_SPLIT(Contacts, ';', 1) DAT

STRING_SPLIT provides also an ordinal field, which can be used in theory in pivoting the values, though we'd return then from where we started. Instead of using the query just generated, let's exemplify an alternative solution which is available with SQL Server 2005 for concatenating strings across records:
 
-- concatenating names per store via PIVOT (SQL Server 2012+)
SELECT BusinessEntityID
, [1] Contact1
, [2] Contact2
, [3] Contact3
, [4] Contact4
, Concat([1], IsNull(';' + [2], ''), IsNull(';' + [3], ''), IsNull(';' + [4], '')) Contacts
FROM (
	-- concatenating names and adding a rank
	SELECT BusinessEntityID
	, Concat(FirstName, ' ', LastName) Contact
	, ROW_NUMBER() OVER(PARTITION BY BusinessEntityID ORDER BY FirstName) Ranking
	FROM Sales.vStoreWithContacts
) PER
PIVOT (
    Max(Contact)
	FOR Ranking IN ([1], [2], [3], [4])
) AS DAT

It's needed to rewrite the Concat function to port the code on SQL Server 2005 though. 

Talking about workarounds for splitting strings, in certain scenarios I used a combination of CutLeft & CutRight functions, which proved to be useful in data migrations, or use my own version of STRING_SPLIT (see SplitListWithIndex or SplitList). For concatenations I used mainly CTEs (see example) or cursors for exceptional cases (see example).

Happy coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.