Showing posts with label queries. Show all posts
Showing posts with label queries. Show all posts

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Mar-2025

Query Acceleration
Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

  • {def} 
    • indexes and caches data landing in OneLake on the fly
      • {benefit} allows to 
        • analyze real-time streams coming directly into Eventhouse and 
        • combine it with data landing in OneLake either coming from mirrored databases, Warehouses, Lakehouses or Spark [2] 
        • ⇒ accelerate data landing in OneLake
          • ⇐ including existing data and any new updates, and expect similar performance [1]
          • eliminates the need to 
            • manage ingestion pipelines [1]
            • maintain duplicate copies of data [1]
          • ensures that data remains in sync without additional effort [4]
          • the initial process is dependent on the size of the external table [4]
      • ⇐ provides significant performance comparable to ingesting data in Eventhouse [1]
        • in some cases up to 50x and beyond [2]
      • ⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]
        • when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
        • accelerating the shortcut means equivalent ingestion into the Eventhouse: indexing, caching, other optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
      • all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]  
      • behave like external tables, with the same limitations and capabilities [4]
        • {limitation} materialized view aren't supported [1]
        • {limitation} update policies aren't supported [1]
    • allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]
      • ⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]
        • ⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]
    • {costs} charged under OneLake Premium cache meter [2]
      • ⇐ similar to native Eventhouse tables [2]
      • one can control the amount of data to accelerate by configuring number of days to cache [2]
      • indexing activity may also count towards CU consumption [2]
    • {limitation} the number of columns in the external table can't exceed 900 [1]
    • {limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
    • {limitation} the feature assumes delta tables with static advanced features
      • e.g. column mapping doesn't change, partitions don't change, etc
      • {recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]
    • {limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]
      • might result in acceleration starting from scratch if there was breaking schema change [1]
    • {limitation} index-based pruning isn't supported for partitions [1]
    • {limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

28 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part V: Common Table Expressions for Employee Hierarchies and Interest Rates)

Continuing the series of posts on the use of Copilot for generating simple solutions, in this post let's consider the generation of Common Table Expressions (CTEs) based on several examples.

Disclaimer: The below code was generated by Copilot via prompts used for demonstration purposes!

Use Case 1: Employee Hierarchies

One of the most basic examples in which recursive CTE are used is the traversing of an organization's hierarchy, which can have multiple levels. The following prompts attempt step by step to provide different approaches for traversing the hierarchy. 

One should run the prompts individually.

Prompt(s):
Create a query based on a common table expression that traverses a hierarchy in an organization.
Generate an example on which the query can be tested.
Rewrite the queries by placing the comma first.
Rewrite the queries and condense their structure.

Output:

--Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3);


-- CTE Query for Hierarchy Traversal
WITH EmployeeHierarchy AS (
    SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
    FROM Employees WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM EmployeeHierarchy
ORDER BY Level, EmployeeName;

It would be great if the query would return also Manager's name:

Prompt(s):
Rewrite the final query based on EmployeeHierarchy and condense their structure.

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level, NULL AS ManagerName
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1, m.EmployeeName
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, eh.ManagerName
FROM EmployeeHierarchy eh
ORDER BY eh.Level, eh.EmployeeName;

Unfortunately, running the query leads to the following error message:
"Msg 462, Level 16, State 1, Line 3, Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Prompt:
Rewrite the query to fix the error "Outer join is not allowed in the recursive part of a recursive common table expression 'EmployeeHierarchy'."

Output:

WITH EmployeeHierarchy AS (
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, 0 AS Level
    FROM Employees e WHERE e.ManagerID IS NULL
    UNION ALL
    SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT eh.EmployeeID, eh.EmployeeName, eh.ManagerID, eh.Level, m.EmployeeName AS ManagerName
FROM EmployeeHierarchy eh
LEFT JOIN Employees m ON eh.ManagerID = m.EmployeeID
ORDER BY eh.Level, eh.EmployeeName;

Now the output looks much better. It will be interesting to see whether Copilot learns in time from such errors. The error and the fact that Copilot was able to generate the proper code after the correction are a bit unexpected.

Prompt:
Generate a sample table for the problem containing five levels. Condense the statements. Generate the statement for dropping the existing table.

Output:

-- Dropping the Existing Table
DROP TABLE IF EXISTS Employees;

-- Creating the Employees Table
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100), ManagerID INT
);

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID) VALUES
    (1, 'Alice', NULL), (2, 'Bob', 1), (3, 'Charlie', 1),
    (4, 'David', 2), (5, 'Eve', 2), (6, 'Frank', 3), (7, 'Grace', 3),
    (8, 'Hannah', 4), (9, 'Ivy', 4), (10, 'Jack', 5), (11, 'Karen', 5),
    (12, 'Leo', 6), (13, 'Mona', 6), (14, 'Nina', 7), (15, 'Oscar', 7),
    (16, 'Paul', 8), (17, 'Quinn', 8), (18, 'Rita', 9), (19, 'Sam', 9),
    (20, 'Tom', 10), (21, 'Uma', 10), (22, 'Vera', 11), (23, 'Will', 11),
    (24, 'Xena', 12), (25, 'Yara', 12), (26, 'Zane', 13), (27, 'Amy', 13),
    (28, 'Ben', 14), (29, 'Cathy', 14), (30, 'Dan', 15), (31, 'Ella', 15);

Prompt:
Modify the last query to include the full path from the Manager to the Employee.

Output:

-- CTE Query for Hierarchy Traversal with Full Path
WITH EmployeeHierarchy AS (
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , 0 AS Level
        , CAST(e.EmployeeName AS VARCHAR(MAX)) AS Path
    FROM 
        Employees e 
    WHERE 
        e.ManagerID IS NULL
    UNION ALL
    SELECT 
        e.EmployeeID
        , e.EmployeeName
        , e.ManagerID
        , eh.Level + 1
        , CAST(eh.Path + ' -> ' + e.EmployeeName AS VARCHAR(MAX))
    FROM 
        Employees e
    INNER JOIN 
        EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT 
    eh.EmployeeID
    , eh.EmployeeName
    , eh.ManagerID
    , eh.Level
    , eh.Path
FROM 
    EmployeeHierarchy eh
ORDER BY 
    eh.Level
    , eh.EmployeeName;

The more lines the code has, the more it makes sense to expand the query as above. It would be great if one could apply a set of prompts to an entire session (e.g. always include the schema when calling an object, always format the code in a certain way, etc.).

Use Case 2: Interest Rates

Another often met problem is to calculate the interest rate for several years based on a list of items. 

Prompt(s):
Create a query based on a common tables expression that applies interest rate of .05 on a yearly basis to a list of items for the past 8 years with projection over other 2 years. Generate an example on which the query can be tested.
Add distinct dates to the table as starting points for the simulation.
Add the schema name to the statements.

Output:

-- Create the table with initial data
CREATE TABLE finance.items (
    item_id INT PRIMARY KEY,
    item_name VARCHAR(50),
    initial_value DECIMAL(10, 2),
    start_date DATE
);

-- Insert example data into the table
INSERT INTO finance.items (item_id, item_name, initial_value, start_date) VALUES
(1, 'Item A', 1000.00, '2017-01-01'),
(2, 'Item B', 1500.00, '2017-06-01'),
(3, 'Item C', 2000.00, '2018-01-01');

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        initial_value AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        vp.value * 1.05 AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

In a second attempt, in a second session, Copilot create the logic in the Finance schema (rather than in dbo) and left out the statement for creating it, which is a bit surprising. This can be easily fixed though:

Prompt(s):
Write the statement for creating the schema. 

Output

-- Create the schema
CREATE SCHEMA finance;

Trying to run the query lead to the well-known error message:

"Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Prompt(s):
Rewrite the query to fix the error "Msg 240, Level 16, State 1, Line 3, Types don't match between the anchor and the recursive part in column "value" of recursive query "ValueProjection"."

Output

-- CTE to calculate the value over the years
WITH ValueProjection AS (
    SELECT
        item_id,
        item_name,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        start_date,
        start_date AS projection_date,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        vp.start_date,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        vp.year + 1 AS year
    FROM
        ValueProjection vp
    WHERE
        vp.year < 9
)
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    ValueProjection
ORDER BY
    item_id,
    projection_date;

This time the query runs successfully.

CTEs are useful for solving many problems, though are there alternatives? Copilot used a loop for obtaining the same result:

Prompt(s):
Write a query to obtain the same result without the use of common table expressions.

Output:

-- Create a temporary table to store the projections
CREATE TABLE #ValueProjection (
    item_id INT,
    item_name VARCHAR(50),
    value DECIMAL(10, 2),
    projection_date DATE,
    year INT
);

-- Insert initial values into the temporary table
INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
SELECT
    item_id,
    item_name,
    initial_value,
    start_date,
    0
FROM
    finance.items;

-- Loop to calculate the value over the years
DECLARE @year INT = 1;
WHILE @year <= 9
BEGIN
    INSERT INTO #ValueProjection (item_id, item_name, value, projection_date, year)
    SELECT
        vp.item_id,
        vp.item_name,
        CAST(vp.value * 1.05 AS DECIMAL(10, 2)) AS value,
        DATEADD(YEAR, 1, vp.projection_date) AS projection_date,
        @year
    FROM
        #ValueProjection vp
    WHERE
        vp.year = @year - 1;

    SET @year = @year + 1;
END;

-- Select the final results
SELECT
    item_id,
    item_name,
    value,
    projection_date
FROM
    #ValueProjection
ORDER BY
    item_id,
    projection_date;

-- Drop the temporary table
DROP TABLE #ValueProjection;

The query seems to be correct, despite the fact that in a previous session the dates were not correctly considered, existing bigger gaps in between than 1 year for some records. Attempts to replicate the issue were unsuccessful, though this behavior is strange. One should always test the queries thoroughly, especially when they were generated by Copilot!

It's important to always consider adding the schema to the objects created, and this should be part of the initial prompt!

One can further increase the complexity of the query. 

One should try also to see if further solutions exist. For example, the next prompt generated a solution based on UNIONs, which usually is not an effective solution even if it gets the job done!

Prompt:
Write a query to obtain the same result without the use of common table expressions or loops.

Output:

-- Query to project values over 10 years
SELECT
    item_id,
    item_name,
    initial_value AS value,
    start_date AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * 1.05 AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 1, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 2) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 2, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 3) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 3, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 4) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 4, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 5) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 5, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 6) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 6, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 7) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 7, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 8) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 8, start_date) AS projection_date
FROM
    finance.items
UNION ALL
SELECT
    item_id,
    item_name,
    CAST(initial_value * POWER(1.05, 9) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, 9, start_date) AS projection_date
FROM
    finance.items
-- Order the results
ORDER BY
    item_id,
    projection_date;

Are these all possible solutions? One can test this via the following prompt which can be used again until all solutions were exhausted:

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

-- Use a cross join to generate the projections for each year.
-- Create a numbers table to represent the years
CREATE TABLE finance.numbers (
    n INT PRIMARY KEY
);

-- Insert numbers from 0 to 9 into the numbers table
INSERT INTO finance.numbers (n) VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

-- Query to project values over 10 years
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * POWER(1.05, n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Frankly, this is the most compact solution! I used variations of it in other contexts via a table created on the fly or a table-valued function. The solution can be easily extended to consider different rates.

Let's try another prompt:

Prompt:
Extend the last query to include different rates for each year that increase or decrease by 0.01 percent.

Output:

-- Query to project values over 10 years with varying interest rates 
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * EXP(SUM(LOG(1 + 0.05 + (n.n * 0.01)))) OVER (PARTITION BY i.item_id ORDER BY n.n) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

Unfortunately, this led again to an error:

"Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Though, it can be fixed with the prompt:

Prompt:
Fix the error "Msg 4113, Level 15, State 6, Line 5, The function 'EXP' is not a valid windowing function, and cannot be used with the OVER clause."

Output:

-- Query to project values over 10 years with varying interest rates
SELECT
    i.item_id,
    i.item_name,
    CAST(i.initial_value * (
        SELECT EXP(SUM(LOG(1 + 0.05 + (n2.n * 0.01))))
        FROM finance.numbers n2
        WHERE n2.n <= n.n
    ) AS DECIMAL(10, 2)) AS value,
    DATEADD(YEAR, n.n, i.start_date) AS projection_date
FROM
    finance.items i
CROSS JOIN
    finance.numbers n
ORDER BY
    i.item_id,
    n.n;

I used the same solution in a previous post, and it isn't trivial at all, even if the trick is simple!

Is this everything? Probably, not! Running again the first statement from the first section in a new session led to the following query, which can be transposed if needed:

-- Query with CTE
WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        initial_value * POWER(1.05, 0) AS value_year_0,
        initial_value * POWER(1.05, 1) AS value_year_1,
        initial_value * POWER(1.05, 2) AS value_year_2,
        initial_value * POWER(1.05, 3) AS value_year_3,
        initial_value * POWER(1.05, 4) AS value_year_4,
        initial_value * POWER(1.05, 5) AS value_year_5,
        initial_value * POWER(1.05, 6) AS value_year_6,
        initial_value * POWER(1.05, 7) AS value_year_7,
        initial_value * POWER(1.05, 8) AS value_year_8,
        initial_value * POWER(1.05, 9) AS value_year_9,
        initial_value * POWER(1.05, 10) AS value_year_10
    FROM
        finance.items
)
SELECT
    item_id,
    item_name,
    start_date,
    value_year_0,
    value_year_1,
    value_year_2,
    value_year_3,
    value_year_4,
    value_year_5,
    value_year_6,
    value_year_7,
    value_year_8,
    value_year_9,
    value_year_10
FROM
    InterestProjection;

One should rerun the prompt also when it appears that all the possible solutions were exhausted!

Prompt(s):
Write a query to obtain the same result without the use of the previous methods used above.

Output:

WITH InterestProjection AS (
    SELECT
        item_id,
        item_name,
        start_date,
        initial_value,
        CAST(initial_value AS DECIMAL(10, 2)) AS value,
        0 AS year
    FROM
        finance.items
    UNION ALL
    SELECT
        ip.item_id,
        ip.item_name,
        ip.start_date,
        ip.initial_value,
        CAST(ip.value * 1.05 AS DECIMAL(10, 2)) AS value,
        ip.year + 1 AS year
    FROM
        InterestProjection ip
    WHERE
        ip.year < 10
)
SELECT
    item_id,
    item_name,
    start_date,
    MAX(CASE WHEN year = 0 THEN value END) AS value_year_0,
    MAX(CASE WHEN year = 1 THEN value END) AS value_year_1,
    MAX(CASE WHEN year = 2 THEN value END) AS value_year_2,
    MAX(CASE WHEN year = 3 THEN value END) AS value_year_3,
    MAX(CASE WHEN year = 4 THEN value END) AS value_year_4,
    MAX(CASE WHEN year = 5 THEN value END) AS value_year_5,
    MAX(CASE WHEN year = 6 THEN value END) AS value_year_6,
    MAX(CASE WHEN year = 7 THEN value END) AS value_year_7,
    MAX(CASE WHEN year = 8 THEN value END) AS value_year_8,
    MAX(CASE WHEN year = 9 THEN value END) AS value_year_9,
    MAX(CASE WHEN year = 10 THEN value END) AS value_year_10
FROM
    InterestProjection
GROUP BY
    item_id, item_name, start_date;

Copilot missed again to cast the value in the CTE to decimal, respectively introduced the RECURSIVE keyword which had to be removed, though it worked after the chances! Frankly, database engines should be smart enough to cast values in similar cases (though this may probably lead to curious cases). 

Further attempts led to the creation of a stored procedure to encapsulate the logic, respectively to other interesting variations. Though, after some time, the solutions started to reappear, even if the differences are minor. 

Conclusion

What can be learned from this exercise? All the roads lead to Rome, or at least it seems to be the case in this post considering that multiple methods could be used to solve the same problem. More important, one shouldn't stop exploring just because a solution is already available! 

Looking forward at the use of Copilot, ideally, one should formulate the simplest pattern and start exploring incrementally by changing the initial parameters, approaching the problem from different perspectives. Copilot might not solve all problems, though it's good at finding alternatives. 

Happy coding!

Previous Post <<||>> Next Post

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.