28 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part V: Common Table Expressions for Employee Hierarchies and Interest Rates)

Continuing the series of posts on the use of Copilot for generating simple solutions, in this post let's consider the generation of Common Table Expressions (CTEs) based on several examples.

Disclaimer: The below code was generated by Copilot via prompts used for demonstration purposes!

Use Case 1: Employee Hierarchies

One of the most basic examples in which recursive CTE are used is the traversing of an organization's hierarchy, which can have multiple levels. The following prompts attempt step by step to provide different approaches for traversing the hierarchy. 

One should run the prompts individually.

Prompt(s):
Create a query based on a common table expression that traverses a hierarchy in an organization.
Generate an example on which the query can be tested.
Rewrite the queries by placing the comma first.
Rewrite the queries and condense their structure.

Output:

--Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3);


-- CTE Query for Hierarchy Traversal
WITH EmployeeHierarchy AS (
    SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
    FROM Employees WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM EmployeeHierarchy
ORDER BY Level, EmployeeName;

It would be great if the query would return also Manager's name:

Prompt(s):
Rewrite the final query based on EmployeeHierarchy and condense their structure.

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level, NULL AS ManagerName
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1, m.EmployeeName
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, eh.ManagerName
FROM EmployeeHierarchy eh
ORDER BY eh.Level, eh.EmployeeName;

Unfortunately, running the query leads to the following error message:
"Msg 462, Level 16, State 1, Line 3, Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Prompt:
Rewrite the query to fix the error "Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, m.EmployeeName AS ManagerName
FROM EmployeeHierarchy eh
LEFT JOIN Employees m ON eh.ManagerID = m.EmployeeID
ORDER BY eh.Level, eh.EmployeeName;

Now the output looks much better. It will be interesting to see whether Copilot learns in time from such errors. The error and the fact that Copilot was able to generate the proper code after the correction are a bit unexpected.

Prompt:
Generate a sample table for the problem containing five levels. Condense the statements. Generate the statement for dropping the existing table.

Output:

-- Dropping the Existing Table
DROP TABLE IF EXISTS Employees;

-- Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3),
    (8, 'Hannah', 4), (9, 'Ivy', 4), (10, 'Jack', 5), (11, 'Karen', 5),
    (12, 'Leo', 6), (13, 'Mona', 6), (14, 'Nina', 7), (15, 'Oscar', 7),
    (16, 'Paul', 8), (17, 'Quinn', 8), (18, 'Rita', 9), (19, 'Sam', 9),
    (20, 'Tom', 10), (21, 'Uma', 10), (22, 'Vera', 11), (23, 'Will', 11),
    (24, 'Xena', 12), (25, 'Yara', 12), (26, 'Zane', 13), (27, 'Amy', 13),
    (28, 'Ben', 14), (29, 'Cathy', 14), (30, 'Dan', 15), (31, 'Ella', 15);

Prompt:
Modify the last query to include the full path from the Manager to the Employee.

Output:

-- CTE Query for Hierarchy Traversal with Full Path
WITH EmployeeHierarchy AS (
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , 0 AS Level
        , CAST(e.EmployeeName AS VARCHAR(MAX)) AS Path
    FROM 
        Employees e 
    WHERE 
        e.ManagerID IS NULL
    UNION ALL
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , eh.Level + 1
        , CAST(eh.Path + ' -> ' + e.EmployeeName AS VARCHAR(MAX))
    FROM 
        Employees e
    INNER JOIN 
        EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT 
    eh.EmployeeID
    , eh.EmployeeName
    , eh.ManagerID
    , eh.Level
    , eh.Path
FROM 
    EmployeeHierarchy eh
ORDER BY 
    eh.Level
    , eh.EmployeeName;

The more lines the code has, the more it makes sense to expand the query as above. It would be great if one could apply a set of prompts to an entire session (e.g. always include the schema when calling an object, always format the code in a certain way, etc.).

Use Case 2: Interest Rates

Another often met problem is to calculate the interest rate for several years based on a list of items. 

Prompt(s):
Create a query based on a common tables expression that applies interest rate of .05 on a yearly basis to a list of items for the past 8 years with projection over other 2 years. Generate an example on which the query can be tested.
Add distinct dates to the table as starting points for the simulation.
Add the schema name to the statements.

Output:

-- Create the table with initial data
CREATE TABLE finance.items (
    item_id INT PRIMARY KEY,
    item_name VARCHAR(50),
    initial_value DECIMAL(10, 2),
    start_date DATE
);

-- Insert example data into the table
INSERT INTO finance.items (item_id, item_name, initial_value, start_date) VALUES
(1, 'Item A', 1000.00, '2017-01-01'),
(2, 'Item B', 1500.00, '2017-06-01'),
(3, 'Item C', 2000.00, '2018-01-01');

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        initial_value AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        vp.value * 1.05 AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

In a second attempt, in a second session, Copilot create the logic in the Finance schema (rather than in dbo) and left out the statement for creating it, which is a bit surprising. This can be easily fixed though:

Prompt(s):
Write the statement for creating the schema. 

Output

-- Create the schema
CREATE SCHEMA finance;

Trying to run the query lead to the well-known error message:

"Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Prompt(s):
Rewrite the query to fix the error "Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Output

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

This time the query runs successfully.

CTEs are useful for solving many problems, though are there alternatives? Copilot used a loop for obtaining the same result:

Prompt(s):
Write a query to obtain the same result without the use of common table expressions.

Output:

-- Create a temporary table to store the projections
CREATE TABLE #ValueProjection (
    item_id INT,
    item_name VARCHAR(50),
    value DECIMAL(10, 2),
    projection_date DATE,
    year INT
);

-- Insert initial values into the temporary table
INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
SELECT
    item_id,
    item_name,
    initial_value,
    start_date,
    0
FROM
    finance.items;

-- Loop to calculate the value over the years
DECLARE @year INT = 1;
WHILE @year <= 9
BEGIN
    INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        @year
    FROM
        #ValueProjection vp
    WHERE
        vp.year = @year - 1;

    SET @year = @year + 1;
END;

-- Select the final results
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    #ValueProjection
ORDER BY
    item_id,
    projection_date;

-- Drop the temporary table
DROP TABLE #ValueProjection;

The query seems to be correct, despite the fact that in a previous session the dates were not correctly considered, existing bigger gaps in between than 1 year for some records. Attempts to replicate the issue were unsuccessful, though this behavior is strange. One should always test the queries thoroughly, especially when they were generated by Copilot!

It's important to always consider adding the schema to the objects created, and this should be part of the initial prompt!

One can further increase the complexity of the query. 

One should try also to see if further solutions exist. For example, the next prompt generated a solution based on UNIONs, which usually is not an effective solution even if it gets the job done!

Prompt:
Write a query to obtain the same result without the use of common table expressions or loops.

Output:

-- Query to project values over 10 years
SELECT
    item_id,
    item_name,
    initial_value AS value,
    start_date AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * 1.05 AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 1, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 2) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 2, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 3) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 3, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 4) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 4, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 5) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 5, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 6) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 6, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 7) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 7, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 8) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 8, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 9) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 9, start_date) AS projection_date
FROM
    finance.items
-- Order the results
ORDER BY
    item_id,
    projection_date;

Are these all possible solutions? One can test this via the following prompt which can be used again until all solutions were exhausted:

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

-- Use a cross join to generate the projections for each year.
-- Create a numbers table to represent the years
CREATE TABLE finance.numbers (
    n INT PRIMARY KEY
);

-- Insert numbers from 0 to 9 into the numbers table
INSERT INTO finance.numbers (n) VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

-- Query to project values over 10 years
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * POWER(1.05, n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Frankly, this is the most compact solution! I used variations of it in other contexts via a table created on the fly or a table-valued function. The solution can be easily extended to consider different rates.

Let's try another prompt:

Prompt:
Extend the last query to include different rates for each year that increase or decrease by 0.01 percent.

Output:

-- Query to project values over 10 years with varying interest rates 
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * EXP(SUM(LOG(1 + 0.05 + (n.n * 0.01)))) OVER (PARTITION BY i.item_id ORDER BY n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Unfortunately, this led again to an error:

"Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Though, it can be fixed with the prompt:

Prompt:
Fix the error "Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Output:

-- Query to project values over 10 years with varying interest rates
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * (
        SELECT EXP(SUM(LOG(1 + 0.05 + (n2.n * 0.01))))
        FROM finance.numbers n2
        WHERE n2.n <= n.n
    ) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

I used the same solution in a previous post, and it isn't trivial at all, even if the trick is simple!

Is this everything? Probably, not! Running again the first statement from the first section in a new session led to the following query, which can be transposed if needed:

-- Query with CTE
WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        initial_value * POWER(1.05, 0) AS value_year_0,
        initial_value * POWER(1.05, 1) AS value_year_1,
        initial_value * POWER(1.05, 2) AS value_year_2,
        initial_value * POWER(1.05, 3) AS value_year_3,
        initial_value * POWER(1.05, 4) AS value_year_4,
        initial_value * POWER(1.05, 5) AS value_year_5,
        initial_value * POWER(1.05, 6) AS value_year_6,
        initial_value * POWER(1.05, 7) AS value_year_7,
        initial_value * POWER(1.05, 8) AS value_year_8,
        initial_value * POWER(1.05, 9) AS value_year_9,
        initial_value * POWER(1.05, 10) AS value_year_10
    FROM
        finance.items
)
SELECT
    item_id,
    item_name,
    start_date,
    value_year_0,
    value_year_1,
    value_year_2,
    value_year_3,
    value_year_4,
    value_year_5,
    value_year_6,
    value_year_7,
    value_year_8,
    value_year_9,
    value_year_10
FROM
    InterestProjection;

One should rerun the prompt also when it appears that all the possible solutions were exhausted!

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        ip.item_id,
        ip.item_name,
        ip.start_date,
        ip.initial_value,
        CAST(ip.value * 1.05 AS DECIMAL(10, 2)) AS value,
        ip.year + 1 AS year
    FROM
        InterestProjection ip
    WHERE
        ip.year < 10
)
SELECT
    item_id,
    item_name,
    start_date,
    MAX(CASE WHEN year = 0 THEN value END) AS value_year_0,
    MAX(CASE WHEN year = 1 THEN value END) AS value_year_1,
    MAX(CASE WHEN year = 2 THEN value END) AS value_year_2,
    MAX(CASE WHEN year = 3 THEN value END) AS value_year_3,
    MAX(CASE WHEN year = 4 THEN value END) AS value_year_4,
    MAX(CASE WHEN year = 5 THEN value END) AS value_year_5,
    MAX(CASE WHEN year = 6 THEN value END) AS value_year_6,
    MAX(CASE WHEN year = 7 THEN value END) AS value_year_7,
    MAX(CASE WHEN year = 8 THEN value END) AS value_year_8,
    MAX(CASE WHEN year = 9 THEN value END) AS value_year_9,
    MAX(CASE WHEN year = 10 THEN value END) AS value_year_10
FROM
    InterestProjection
GROUP BY
    item_id, item_name, start_date;

Copilot missed again to cast the value in the CTE to decimal, respectively introduced the RECURSIVE keyword which had to be removed, though it worked after the chances! Frankly, database engines should be smart enough to cast values in similar cases (though this may probably lead to curious cases). 

Further attempts led to the creation of a stored procedure to encapsulate the logic, respectively to other interesting variations. Though, after some time, the solutions started to reappear, even if the differences are minor. 

Conclusion

What can be learned from this exercise? All the roads lead to Rome, or at least it seems to be the case in this post considering that multiple methods could be used to solve the same problem. More important, one shouldn't stop exploring just because a solution is already available! 

Looking forward at the use of Copilot, ideally, one should formulate the simplest pattern and start exploring incrementally by changing the initial parameters, approaching the problem from different perspectives. Copilot might not solve all problems, though it's good at finding alternatives. 

Happy coding!

Previous Post <<||>> Next Post

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

As shown in the previous two posts, Copilot helps to generate code even if this may involve multiple prompts and iterations. However, it works and probably new beginners can learn a lot from the code generated in the process. The current post is intended to provide some scenarios you can try in Copilot or any other prompt-based AI.

Disclaimer: The below code was generated by Copilot via prompts for demonstration purposes!

Use Case 1: Extracting Concatenated Emails 

Sometimes one needs to work with distribution lists (list of emails delimited usually by ";"), following to extract the individual emails for the various purposes (e.g. validation, user's identification, etc.). The simplest solution is to use the STRING_SPLIT function introduced in SQL Server 2016, while previously the solutions involved parsing the sting within a loop as part of a stored procedure or user-defined function (UDF). When there's no adequate formatting, it can still be feasible to parse strings within loops. 

Prompt:
Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the procedure. 

Output:

-- stored procedure generated by Copilot
CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Create a table to store the individual emails
    CREATE TABLE #EmailTable (Email NVARCHAR(255));

    -- Split the email list and insert into the table
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO #EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END


    -- Return the result set
    SELECT Email FROM #EmailTable;

    -- Clean up
    DROP TABLE #EmailTable;
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

Prompt:
Create a user-defined function that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the function.

-- create the function
CREATE FUNCTION dbo.SplitEmails (@EmailList NVARCHAR(MAX))
RETURNS @EmailTable TABLE (Email NVARCHAR(255))
AS
BEGIN
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO @EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END

    RETURN;
END;


-- Test the SplitEmails function
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

SELECT * FROM dbo.SplitEmails(@EmailList);

From the two solutions probably the table-valued function (TVF) offers the most flexibility as it can be easier integrated in further logic. It would be interesting to also perform some performance texts. As long as the data volume is neglectable, an UDF is the way to go! 

However, one can directly use the String_Split  function. One can wonder why the function wasn't used in the first place?!

Prompt:

Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails using the String_Split functio. Write the code to test the procedure. 

Output:

CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Use STRING_SPLIT to split the email list into individual emails
    SELECT value AS Email
    FROM STRING_SPLIT(@EmailList, ';');
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

This is probably a good example to show that one should start with a narrow question and expand it as needed. Though, this depends also on the problem to solve.

Use Case 2: Concatenate Emails from Table

It's often needed to concatenate the emails for a Customer or other entity to send standard emails to the parties associated with the respective emails. I wrote over the years several posts on the topic, long before the STRING_AGG function was introduced in SQL Server 2017. Even if the STRING_AGG function can be called directly without the need to use other scaffolding, it was used for its simplicity instead of doing a concatenation.

In the below examples one can see how a small change in the prompt can have an important impact on the solution generated.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table. Provide the code to test the function.

Output:

-- create the user defined function
CREATE FUNCTION dbo.ConcatenateEmails()
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM YourEmailTable;

    RETURN @EmailList;
END;

-- test the function
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmails() AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table provided as parameter. Provide the code to test the function.

Output: 

CREATE FUNCTION dbo.ConcatenateEmailsFromSP(@TableName NVARCHAR(128))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    DECLARE @EmailList NVARCHAR(MAX);

    SET @SQL = 'SELECT @EmailList = STRING_AGG(Email, '';'') FROM ' + @TableName;

    EXEC sp_executesql @SQL, N'@EmailList NVARCHAR(MAX) OUTPUT', @EmailList OUTPUT;

    RETURN @EmailList;
END;

-- testing the code
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');


-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmailsFromSP('YourEmailTable') AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Running the create statements leads however to the following error message:

"Msg 557, Level 16, State 2, Line 3, Only functions and some extended stored procedures can be executed from within a function."

At least in the previous versions of SQL Server, sp_executesql couldn't be executed from a UDF. Even if it's possible to write such code, it will not work, as the example shows! It's to wonder what led to this solution. 

Prompt:

Create a function in SQL Server that concatenates in a list the emails from a table variable provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromTableVar(@EmailTable TABLE (Email NVARCHAR(255)))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;

Executing the code results in the following error message:

"Msg 156, Level 15, State 1, Line 1, Incorrect syntax near the keyword 'TABLE'."

Checking the documentation, the following limitation should be observed [1]:  

"Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine." [1]

The correct use should be observed in the next example.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table-valued parameter. Provide the code to test the function.

Output:

-- create a table type for the table-valued parameter:
CREATE TYPE EmailTableType AS TABLE 
(
    Email NVARCHAR(255)
)

-- create the table-valued function
CREATE FUNCTION dbo.ConcatenateEmailsFromTVP(@EmailTable EmailTableType READONLY)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;


-- Declare a table variable of the table type and insert some test data
DECLARE @EmailTable EmailTableType;

INSERT INTO @EmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmailsFromTVP function
SELECT dbo.ConcatenateEmailsFromTVP(@EmailTable) AS EmailList;

Final Thoughts

See a previous post for the uses of STRING_SPLIT and STRING_AGG complementary functions. If one doesn't know what is doing, it's easy to go on the wrong path(s) or eventually choose suboptimal solutions even when assisted by AI. If blunders can happen for simple cases, what will happen when the size of the solutions is hundreds and thousands times more complex? How will the blunders multiply?

As it can be seen from the not working examples, Copilot has more problems besides hallucinations. Even if semantically some solutions are viable, the code might break in contact with reality! Moreover, even if something works, one needs to weigh solutions in terms of performance and use specific techniques to address them! AI is more likely feasible in working with high-level programming languages like C++ or Java that hopefully offer fewer chances for mistakes.

Once the code templates are generated, the code can be extended as needed. Frankly, even if the code is quite basic, the result is impressive considered through the amount of work involved to generate it. One can but wonder whether the code was taken from a source, part of the training, respectively part of the AI heuristics. It can probably be a combination of all.

Even if Copilot generated the template, the developers still need to be able to implement it in the broader solution, which sometimes might be more challenging than expected, especially when the details matter. The devil is in the details after all! Developers must also recognize which solution is more appropriate in terms of flexibility, maintenance, performance or whatever criteria may apply. I was wondering whether one can change the code to a degree that one can't recognize anymore that it was written by Copilot. Any ideas?

It's to be expected that in the near future such tools will be able to provide more complex solutions. I will not say that programming is dead, as some proclaim, though that will shift its focus in the area of prompt engineering. Somebody, and I mean here the programmer or other technical person, must be able to evaluate critically and qualitatively the results, respectively identify the blunders and other types of issues resulting in the process!

Happy coding!

Previous Post <<||>> Next Post 

References:
[1] Microsoft Learn (2024) SQL Server 2022: Use table-valued parameters [link]

Acronyms:
DML - Data Manipulation Language
TVF - Table-Valued Function
UDF - User-Defined Function

26 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part III: Formatting Code via Prompts)

When debugging poorly formatted code, usually it helps to format first the code before doing any other work on it. I observed in the previous post that Copilot can be used to restructure the code based on a set of commands that summarize the best practices. 

Example 1: Simple Query

Let's start with a simple example from the web

Prompt:
Format the below code by using the available SQL best practices
1) each join condition is written on its own line
2) the commas are placed at the beginning of each line
3) use 3 letter based aliases for all tables
4) compact the code by reducing the indentation

SELECT po.ExpectedDeliveryDate, po.OrderDate, pol.LastReceiptDate, pol.LastEditedWhen, p.ValidFrom, p.ValidTo FROM Purchasing.PurchaseOrders AS po JOIN Purchasing.PurchaseOrderLines AS pol ON pol.PurchaseOrderID = po.PurchaseOrderID JOIN Application.People AS p ON p.PersonID = po.ContactPersonID WHERE LastReceiptDate > '1/1/2016';

Output:

SELECT po.ExpectedDeliveryDate
    , po.OrderDate
    , pol.LastReceiptDate
    , pol.LastEditedWhen
    , p.ValidFrom
    , p.ValidTo
FROM Purchasing.PurchaseOrders AS po
JOIN Purchasing.PurchaseOrderLines AS pol
    ON pol.PurchaseOrderID = po.PurchaseOrderID
JOIN Application.People AS p
    ON p.PersonID = po.ContactPersonID
WHERE LastReceiptDate > '1/1/2016';

Of course, the colors were added by the hilite.me formatting to better identify the reserved words. The output of the above prompt is not exactly what I intended in 3), so repeating the slightly changed command seems to obtain the expected result. Even if a 3-letter alias was not used for all tables, the query looks better:

Prompt:
Use 3 letter based aliases for all tables.

Output:

SELECT po.ExpectedDeliveryDate
    , po.OrderDate
    , pol.LastReceiptDate
    , pol.LastEditedWhen
    , ppl.ValidFrom
    , ppl.ValidTo
FROM Purchasing.PurchaseOrders AS po
JOIN Purchasing.PurchaseOrderLines AS pol
    ON pol.PurchaseOrderID = po.PurchaseOrderID
JOIN Application.People AS ppl
    ON ppl.PersonID = po.ContactPersonID
WHERE LastReceiptDate > '1/1/2016';

The script looks good, though it would be great if we could align the constraints with the order of the tables.

Prompt:
Rewrite the query by using the first table from the join in the join constraint.

Output:

SELECT po.ExpectedDeliveryDate
    , po.OrderDate
    , pol.LastReceiptDate
    , pol.LastEditedWhen
    , ppl.ValidFrom
    , ppl.ValidTo
FROM Purchasing.PurchaseOrders AS po
JOIN Purchasing.PurchaseOrderLines AS pol
    ON po.PurchaseOrderID = pol.PurchaseOrderID
JOIN Application.People AS ppl
    ON po.ContactPersonID = ppl.PersonID
WHERE LastReceiptDate > '1/1/2016';

As observed in the last post, multiple scripts can be transformed withing the same prompt, especially when the statements were generated by Copilot as well. 

Example 2: Complex Dynamics AX Query

How about a more complex query? 
Let's consider the below query written for Dynamics AX:

Prompt:
Format the below code by using the available SQL best practices
1) each join condition is written on its own line
2) the commas are placed at the beginning of each line
3) use 3 letter based aliases for all tables
4) compact the code by reducing the indentation

SELECT ADR.DATAAREAID, DPT.FIRSTNAME, DPT.LASTNAME, DPT.LANGUAGEID, DPR.PartyId, DPR.IsPrimary, CASE DPR.Status WHEN 0 THEN 'Active' WHEN 1 THEN 'Inactive' END Status, NullIf(Cast(DPR.ValidFromDateTime as Date), '31-12-1900') ValidFromDateTime, NullIf(Cast(DPR.ValidToDateTime as Date), '30-12-2154') ValidToDateTime, DPR.Shared  , ADR.ADDRTABLEID , ADR.ADDRRECID , ADR.TYPE , CASE ADR.TYPE WHEN 0 THEN 'None' WHEN 1 THEN 'Invoice' WHEN 2 THEN 'Delivery'WHEN 3 THEN 'Alt. delivery'WHEN 4 THEN 'SWIFT'WHEN 5 THEN 'Payment' WHEN 6 THEN 'Service' WHEN 7 THEN 'Home' WHEN 8 THEN 'Other' WHEN 9 THEN 'Business' WHEN 10 THEN 'RemitTo' WHEN 11 THEN 'Third-party shipping address' END AddressDescription, ADR.NAME , ADR.ADDRESS , ADR.STREET, ADR.ZIPCODE , ADR.CITY , ADR.STATE , ADR.COUNTY , ADR.COUNTRYREGIONID , ADR.PHONE , ADR.TELEFAX , ADR.URL FROM ADDRESS ADR JOIN DIRPARTYADDRESSRELATIONSHI1066 DAC ON DAC.REFCOMPANYID = ADR.DATAAREAID and DAC.ADDRESSRECID = ADR.RECID and DAC.DATAAREAID = ADR.DATAAREAID JOIN DIRPARTYADDRESSRELATIONSHIP DPR on DPR.RECID = DAC.PARTYADDRESSRELATIONSHIPRECID and DPR.DATAAREAID = DAC.DATAAREAID JOIN DIRPARTYTABLE DPT on DPT.PARTYID = DPR.PARTYID and DPT.DATAAREAID = DPR.DATAAREAID  JOIN EmplTable EMP on DPT.PARTYID = EMP.PARTYID and DPT.DATAAREAID = EMP.DATAAREAID  

Output:

SELECT 
    ADR.DATAAREAID
    , DPT.FIRSTNAME
    , DPT.LASTNAME
    , DPT.LANGUAGEID
    , DPR.PartyId
    , DPR.IsPrimary
    , CASE DPR.Status 
        WHEN 0 THEN 'Active' 
        WHEN 1 THEN 'Inactive' 
      END AS Status
    , NullIf(Cast(DPR.ValidFromDateTime AS DATE), '31-12-1900') AS ValidFromDateTime
    , NullIf(Cast(DPR.ValidToDateTime AS DATE), '30-12-2154') AS ValidToDateTime
    , DPR.Shared
    , ADR.ADDRTABLEID
    , ADR.ADDRRECID
    , ADR.TYPE
    , CASE ADR.TYPE 
        WHEN 0 THEN 'None' 
        WHEN 1 THEN 'Invoice' 
        WHEN 2 THEN 'Delivery'
        WHEN 3 THEN 'Alt. delivery'
        WHEN 4 THEN 'SWIFT'
        WHEN 5 THEN 'Payment' 
        WHEN 6 THEN 'Service' 
        WHEN 7 THEN 'Home' 
        WHEN 8 THEN 'Other' 
        WHEN 9 THEN 'Business' 
        WHEN 10 THEN 'RemitTo' 
        WHEN 11 THEN 'Third-party shipping address' 
      END AS AddressDescription
    , ADR.NAME
    , ADR.ADDRESS
    , ADR.STREET
    , ADR.ZIPCODE
    , ADR.CITY
    , ADR.STATE
    , ADR.COUNTY
    , ADR.COUNTRYREGIONID
    , ADR.PHONE
    , ADR.TELEFAX
    , ADR.URL
FROM 
    ADDRESS ADR
JOIN 
    DIRPARTYADDRESSRELATIONSHI1066 DAC 
    ON DAC.REFCOMPANYID = ADR.DATAAREAID 
    AND DAC.ADDRESSRECID = ADR.RECID 
    AND DAC.DATAAREAID = ADR.DATAAREAID
JOIN 
    DIRPARTYADDRESSRELATIONSHIP DPR 
    ON DPR.RECID = DAC.PARTYADDRESSRELATIONSHIPRECID 
    AND DPR.DATAAREAID = DAC.DATAAREAID
JOIN 
    DIRPARTYTABLE DPT 
    ON DPT.PARTYID = DPR.PARTYID 
    AND DPT.DATAAREAID = DPR.DATAAREAID
JOIN 
    EmplTable EMP 
    ON DPT.PARTYID = EMP.PARTYID 
    AND DPT.DATAAREAID = EMP.DATAAREAID;

The query looks good, though it needs more transformations:

Input (to be run line by line):

Rewrite the query by writing names in camel with the beginning letter in upper case.
Rewrite the query by writing table aliases in upper case.

Output:

SELECT 
      ADR.DataAreaId
    , DPT.FirstName
    , DPT.LastName
    , DPT.LanguageId
    , DPR.PartyId
    , DPR.IsPrimary
    , CASE DPR.Status 
        WHEN 0 THEN 'Active' 
        WHEN 1 THEN 'Inactive' 
      END AS Status
    , NullIf(Cast(DPR.ValidFromDateTime AS DATE), '31-12-1900') AS ValidFromDateTime
    , NullIf(Cast(DPR.ValidToDateTime AS DATE), '30-12-2154') AS ValidToDateTime
    , DPR.Shared
    , ADR.AddrTableId
    , ADR.AddrRecId
    , ADR.Type
    , CASE ADR.Type 
        WHEN 0 THEN 'None' 
        WHEN 1 THEN 'Invoice' 
        WHEN 2 THEN 'Delivery' 
        WHEN 3 THEN 'Alt. Delivery' 
        WHEN 4 THEN 'Swift' 
        WHEN 5 THEN 'Payment' 
        WHEN 6 THEN 'Service' 
        WHEN 7 THEN 'Home' 
        WHEN 8 THEN 'Other' 
        WHEN 9 THEN 'Business' 
        WHEN 10 THEN 'RemitTo' 
        WHEN 11 THEN 'Third-Party Shipping Address' 
      END AS AddressDescription
    , ADR.Name
    , ADR.Address
    , ADR.Street
    , ADR.ZipCode
    , ADR.City
    , ADR.State
    , ADR.County
    , ADR.CountryRegionId
    , ADR.Phone
    , ADR.Telefax
    , ADR.Url
FROM 
    dbo.Address ADR
JOIN 
    dbo.DirPartyAddressRelationship1066 DAC 
    ON DAC.RefCompanyId = ADR.DataAreaId 
    AND DAC.AddressRecId = ADR.RecId 
    AND DAC.DataAreaId = ADR.DataAreaId
JOIN 
    dbo.DirPartyAddressRelationship DPR 
    ON DPR.RecId = DAC.PartyAddressRelationshipRecId 
    AND DPR.DataAreaId = DAC.DataAreaId
JOIN 
    dbo.DirPartyTable DPT 
    ON DPT.PartyId = DPR.PartyId 
    AND DPT.DataAreaId = DPR.DataAreaId
JOIN 
    dbo.EmplTable EMP 
    ON DPT.PartyId = EMP.PartyId 
    AND DPT.DataAreaId = EMP.DataAreaId;

Example 3: Simple X++ Query

How about code written in other languages (e.g. X++)?

I tried to transform the definition behind X++ form from Dynamics 365 for Finance without effect

    InventItemPurchSetup.CalendarDays = NoYes::No;
    InventItemPurchSetup.HighestQty = 0.00;
    InventItemPurchSetup.InventDimId = "AllBlank";
    InventItemPurchSetup.InventDimIdDefault = "AllBlank";
    InventItemPurchSetup.ItemId = "1000";
    InventItemPurchSetup.LeadTime = 0;
    InventItemPurchSetup.LowestQty = 0.00;
    InventItemPurchSetup.MandatoryInventLocation = NoYes::No;
    InventItemPurchSetup.MandatoryInventSite = NoYes::No;
    InventItemPurchSetup.MultipleQty = 0.00;
    InventItemPurchSetup.Override = NoYes::No;
    InventItemPurchSetup.StandardQty = 0.00;
    InventItemPurchSetup.Stopped = NoYes::No;
    InventItemPurchSetup.Sequence = 0;
    InventItemPurchSetup.OverrideDefaultStorageDimensions = NoYes::No;
    InventItemPurchSetup.SysRowVersionNumber = 0;
    InventItemPurchSetup.insert();

Multiple attempts were performed without any progress, though after a break I was able to apply the following transformations:

Prompts:

Replace the word "InventItemPurchSetup") with ", IPS." and remove the text after the equality sign.
Add a comma in front of each line and remove the semicolon at the end.

With a few manual change the code becomes:

SELECT IPS.CalendarDays 
, IPS.HighestQty 
, IPS.InventDimId 
, IPS.InventDimIdDefault 
, IPS.ItemId 
, IPS.LeadTime 
, IPS.LowestQty 
, IPS.MandatoryInventLocation 
, IPS.MandatoryInventSite 
, IPS.MultipleQty 
, IPS.Override 
, IPS.StandardQty 
, IPS.Stopped 
, IPS.Sequence 
, IPS.OverrideDefaultStorageDimensions 
, IPS.SysRowVersionNumber 
FROM dbo.InventItemPurchSetup IPs

The prompts reflect what I was doings in Excel manually via a replace, respectively the use of "Text to columns" functionality".
 
Probably, there's the opportunity to test further various ideas. Frankly, the transformation from upper case to camel could have helped me in the past a lot!

Disclaimer:
The queries were not tested after the changes were performed (because of unavailability of the systems).

Happy coding!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.