SQL Troubles: JOIN

Showing posts with label JOIN. Show all posts

28 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part V: Common Table Expressions for Employee Hierarchies and Interest Rates)

Continuing the series of posts on the use of Copilot for generating simple solutions, in this post let's consider the generation of Common Table Expressions (CTEs) based on several examples.

Disclaimer: The below code was generated by Copilot via prompts used for demonstration purposes!

Use Case 1: Employee Hierarchies

One of the most basic examples in which recursive CTE are used is the traversing of an organization's hierarchy, which can have multiple levels. The following prompts attempt step by step to provide different approaches for traversing the hierarchy.

One should run the prompts individually.

Prompt(s):
Create a query based on a common table expression that traverses a hierarchy in an organization.
Generate an example on which the query can be tested.
Rewrite the queries by placing the comma first.
Rewrite the queries and condense their structure.

Output:

--Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3);


-- CTE Query for Hierarchy Traversal
WITH EmployeeHierarchy AS (
    SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
    FROM Employees WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM EmployeeHierarchy
ORDER BY Level, EmployeeName;

It would be great if the query would return also Manager's name:

Prompt(s):
Rewrite the final query based on EmployeeHierarchy and condense their structure.

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level, NULL AS ManagerName
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1, m.EmployeeName
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, eh.ManagerName
FROM EmployeeHierarchy eh
ORDER BY eh.Level, eh.EmployeeName;

Unfortunately, running the query leads to the following error message:

"Msg 462, Level 16, State 1, Line 3, Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Prompt:
Rewrite the query to fix the error "Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, m.EmployeeName AS ManagerName
FROM EmployeeHierarchy eh
LEFT JOIN Employees m ON eh.ManagerID = m.EmployeeID
ORDER BY eh.Level, eh.EmployeeName;

Now the output looks much better. It will be interesting to see whether Copilot learns in time from such errors. The error and the fact that Copilot was able to generate the proper code after the correction are a bit unexpected.

Prompt:
Generate a sample table for the problem containing five levels. Condense the statements. Generate the statement for dropping the existing table.

Output:

-- Dropping the Existing Table
DROP TABLE IF EXISTS Employees;

-- Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3),
    (8, 'Hannah', 4), (9, 'Ivy', 4), (10, 'Jack', 5), (11, 'Karen', 5),
    (12, 'Leo', 6), (13, 'Mona', 6), (14, 'Nina', 7), (15, 'Oscar', 7),
    (16, 'Paul', 8), (17, 'Quinn', 8), (18, 'Rita', 9), (19, 'Sam', 9),
    (20, 'Tom', 10), (21, 'Uma', 10), (22, 'Vera', 11), (23, 'Will', 11),
    (24, 'Xena', 12), (25, 'Yara', 12), (26, 'Zane', 13), (27, 'Amy', 13),
    (28, 'Ben', 14), (29, 'Cathy', 14), (30, 'Dan', 15), (31, 'Ella', 15);

Prompt:
Modify the last query to include the full path from the Manager to the Employee.

Output:

-- CTE Query for Hierarchy Traversal with Full Path
WITH EmployeeHierarchy AS (
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , 0 AS Level
        , CAST(e.EmployeeName AS VARCHAR(MAX)) AS Path
    FROM 
        Employees e 
    WHERE 
        e.ManagerID IS NULL
    UNION ALL
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , eh.Level + 1
        , CAST(eh.Path + ' -> ' + e.EmployeeName AS VARCHAR(MAX))
    FROM 
        Employees e
    INNER JOIN 
        EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT 
    eh.EmployeeID
    , eh.EmployeeName
    , eh.ManagerID
    , eh.Level
    , eh.Path
FROM 
    EmployeeHierarchy eh
ORDER BY 
    eh.Level
    , eh.EmployeeName;

The more lines the code has, the more it makes sense to expand the query as above. It would be great if one could apply a set of prompts to an entire session (e.g. always include the schema when calling an object, always format the code in a certain way, etc.).

Use Case 2: Interest Rates

Another often met problem is to calculate the interest rate for several years based on a list of items.

Prompt(s):
Create a query based on a common tables expression that applies interest rate of .05 on a yearly basis to a list of items for the past 8 years with projection over other 2 years. Generate an example on which the query can be tested.
Add distinct dates to the table as starting points for the simulation.
Add the schema name to the statements.

Output:

-- Create the table with initial data
CREATE TABLE finance.items (
    item_id INT PRIMARY KEY,
    item_name VARCHAR(50),
    initial_value DECIMAL(10, 2),
    start_date DATE
);

-- Insert example data into the table
INSERT INTO finance.items (item_id, item_name, initial_value, start_date) VALUES
(1, 'Item A', 1000.00, '2017-01-01'),
(2, 'Item B', 1500.00, '2017-06-01'),
(3, 'Item C', 2000.00, '2018-01-01');

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        initial_value AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        vp.value * 1.05 AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

In a second attempt, in a second session, Copilot create the logic in the Finance schema (rather than in dbo) and left out the statement for creating it, which is a bit surprising. This can be easily fixed though:

Prompt(s):
Write the statement for creating the schema.

Output

-- Create the schema
CREATE SCHEMA finance;

Trying to run the query lead to the well-known error message:

"Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Prompt(s):
Rewrite the query to fix the error "Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Output

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

This time the query runs successfully.

CTEs are useful for solving many problems, though are there alternatives? Copilot used a loop for obtaining the same result:

Prompt(s):
Write a query to obtain the same result without the use of common table expressions.

Output:

-- Create a temporary table to store the projections
CREATE TABLE #ValueProjection (
    item_id INT,
    item_name VARCHAR(50),
    value DECIMAL(10, 2),
    projection_date DATE,
    year INT
);

-- Insert initial values into the temporary table
INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
SELECT
    item_id,
    item_name,
    initial_value,
    start_date,
    0
FROM
    finance.items;

-- Loop to calculate the value over the years
DECLARE @year INT = 1;
WHILE @year <= 9
BEGIN
    INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        @year
    FROM
        #ValueProjection vp
    WHERE
        vp.year = @year - 1;

    SET @year = @year + 1;
END;

-- Select the final results
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    #ValueProjection
ORDER BY
    item_id,
    projection_date;

-- Drop the temporary table
DROP TABLE #ValueProjection;

The query seems to be correct, despite the fact that in a previous session the dates were not correctly considered, existing bigger gaps in between than 1 year for some records. Attempts to replicate the issue were unsuccessful, though this behavior is strange. One should always test the queries thoroughly, especially when they were generated by Copilot!

It's important to always consider adding the schema to the objects created, and this should be part of the initial prompt!

One can further increase the complexity of the query.

One should try also to see if further solutions exist. For example, the next prompt generated a solution based on UNIONs, which usually is not an effective solution even if it gets the job done!

Prompt:
Write a query to obtain the same result without the use of common table expressions or loops.

Output:

-- Query to project values over 10 years
SELECT
    item_id,
    item_name,
    initial_value AS value,
    start_date AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * 1.05 AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 1, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 2) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 2, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 3) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 3, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 4) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 4, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 5) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 5, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 6) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 6, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 7) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 7, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 8) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 8, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 9) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 9, start_date) AS projection_date
FROM
    finance.items
-- Order the results
ORDER BY
    item_id,
    projection_date;

Are these all possible solutions? One can test this via the following prompt which can be used again until all solutions were exhausted:

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

-- Use a cross join to generate the projections for each year.
-- Create a numbers table to represent the years
CREATE TABLE finance.numbers (
    n INT PRIMARY KEY
);

-- Insert numbers from 0 to 9 into the numbers table
INSERT INTO finance.numbers (n) VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

-- Query to project values over 10 years
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * POWER(1.05, n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Frankly, this is the most compact solution! I used variations of it in other contexts via a table created on the fly or a table-valued function. The solution can be easily extended to consider different rates.

Let's try another prompt:

Prompt:
Extend the last query to include different rates for each year that increase or decrease by 0.01 percent.

Output:

-- Query to project values over 10 years with varying interest rates

SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * EXP(SUM(LOG(1 + 0.05 + (n.n * 0.01)))) OVER (PARTITION BY i.item_id ORDER BY n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Unfortunately, this led again to an error:

"Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Though, it can be fixed with the prompt:

Prompt:
Fix the error "Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Output:

-- Query to project values over 10 years with varying interest rates
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * (
        SELECT EXP(SUM(LOG(1 + 0.05 + (n2.n * 0.01))))
        FROM finance.numbers n2
        WHERE n2.n <= n.n
    ) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

I used the same solution in a previous post, and it isn't trivial at all, even if the trick is simple!

Is this everything? Probably, not! Running again the first statement from the first section in a new session led to the following query, which can be transposed if needed:

-- Query with CTE
WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        initial_value * POWER(1.05, 0) AS value_year_0,
        initial_value * POWER(1.05, 1) AS value_year_1,
        initial_value * POWER(1.05, 2) AS value_year_2,
        initial_value * POWER(1.05, 3) AS value_year_3,
        initial_value * POWER(1.05, 4) AS value_year_4,
        initial_value * POWER(1.05, 5) AS value_year_5,
        initial_value * POWER(1.05, 6) AS value_year_6,
        initial_value * POWER(1.05, 7) AS value_year_7,
        initial_value * POWER(1.05, 8) AS value_year_8,
        initial_value * POWER(1.05, 9) AS value_year_9,
        initial_value * POWER(1.05, 10) AS value_year_10
    FROM
        finance.items
)
SELECT
    item_id,
    item_name,
    start_date,
    value_year_0,
    value_year_1,
    value_year_2,
    value_year_3,
    value_year_4,
    value_year_5,
    value_year_6,
    value_year_7,
    value_year_8,
    value_year_9,
    value_year_10
FROM
    InterestProjection;

One should rerun the prompt also when it appears that all the possible solutions were exhausted!

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        ip.item_id,
        ip.item_name,
        ip.start_date,
        ip.initial_value,
        CAST(ip.value * 1.05 AS DECIMAL(10, 2)) AS value,
        ip.year + 1 AS year
    FROM
        InterestProjection ip
    WHERE
        ip.year < 10
)
SELECT
    item_id,
    item_name,
    start_date,
    MAX(CASE WHEN year = 0 THEN value END) AS value_year_0,
    MAX(CASE WHEN year = 1 THEN value END) AS value_year_1,
    MAX(CASE WHEN year = 2 THEN value END) AS value_year_2,
    MAX(CASE WHEN year = 3 THEN value END) AS value_year_3,
    MAX(CASE WHEN year = 4 THEN value END) AS value_year_4,
    MAX(CASE WHEN year = 5 THEN value END) AS value_year_5,
    MAX(CASE WHEN year = 6 THEN value END) AS value_year_6,
    MAX(CASE WHEN year = 7 THEN value END) AS value_year_7,
    MAX(CASE WHEN year = 8 THEN value END) AS value_year_8,
    MAX(CASE WHEN year = 9 THEN value END) AS value_year_9,
    MAX(CASE WHEN year = 10 THEN value END) AS value_year_10
FROM
    InterestProjection
GROUP BY
    item_id, item_name, start_date;

Copilot missed again to cast the value in the CTE to decimal, respectively introduced the RECURSIVE keyword which had to be removed, though it worked after the chances! Frankly, database engines should be smart enough to cast values in similar cases (though this may probably lead to curious cases).

Further attempts led to the creation of a stored procedure to encapsulate the logic, respectively to other interesting variations. Though, after some time, the solutions started to reappear, even if the differences are minor.

Conclusion

What can be learned from this exercise? All the roads lead to Rome, or at least it seems to be the case in this post considering that multiple methods could be used to solve the same problem. More important, one shouldn't stop exploring just because a solution is already available!

Looking forward at the use of Copilot, ideally, one should formulate the simplest pattern and start exploring incrementally by changing the initial parameters, approaching the problem from different perspectives. Copilot might not solve all problems, though it's good at finding alternatives.

Happy coding!

Previous Post <<||>> Next Post

SQL Troubles

Pages

28 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part V: Common Table Expressions for Employee Hierarchies and Interest Rates)

About Me