Showing posts with label table-valued function. Show all posts
Showing posts with label table-valued function. Show all posts

24 April 2025

💎🏭SQL Reloaded: Microsoft Fabric's Lakehouses at Work (Part I: Proof-of-Concept)


Introduction

One way to work with the data files existing in organization is to import them into a lakehouse and build a data model based on them that can be reused in the various solutions (incl. Power BI). As a reminder, a lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location.

The tutorials available on lakehouses are pretty useful for getting an idea how to start, though probably each seasoned professional has his/her way of doing things, at least for testing the capabilities before building a proper solution. The target is thus, to create the minimum for testing the capabilities needed for a proof-of-concept solution. 

The files used in this post are available on Microsoft's GitHub. Download the files and store them to be easily available for the next steps. The following files were considered for the current post: customers.csv, orders.csv and products.csv.

Create the Lakehouse

It's recommended to have a naming convention in place for the various items created in a workspace, e.g. a combination between item type (lakehouse, warehouse), system type (Prod, UAT, Dev, PoC) and eventually department (e.g. FIN, SCM, HR, etc.). One should try to balance between meaning and usefulness. Ideally, one should use 2 maximum 4 letters for each part encoded in the name. For example, the following scripts were created in the LH_SCM_PoC lakehouse. 

More complex naming conventions can include the system (e.g. D365, CRM, EBS) or the company. The target is to easily identify the systems, independently how complex the rules are. Given that it can become challenging to rename the schemas afterwards, ideally the naming convention should be available from the early stages. 

Create the Schema

A lakehouse comes with a dbo schema available by default, though it's recommended to create meaningful schema(s) as needed. The schemas should ideally reflect the domain of the data (e.g. departments or other key areas) and the schemas shouldn't change when the objects are deployed between the different environments. Upon case, one should consider creating multiple schemas that reflect the model's key areas. The names should be simple and suggestive.

-- create schema
CREATE Schema Orders

Create a Staging Area

The next step is to create a staging area where the files in scope can be made available and then further loaded in the lakehouse. One needs to compromise between creating a deep hierarchical structure that reflects the business structure and the need to easily identify, respectively manage the files. An hierarchical structure with 1-2 level could provide the needed compromise, though each additional level tends to increase the complexity. 

One should also consider rules for archiving or even deleting the files.

Upload the Files

Microsoft Fabric allows users to upload multiple files together into a single step. Ideally the files should have proper names for each column, otherwise overheads deriving from this may appear later in the process. 

When the files are available in multiple folders in a complex hierarchical structure, a set of shortcuts could help in their management.

Load the Data

A file's data can be loaded into the lakehouse on the fly by providing a valid table name:
Files >> SCM_Orders >> (select file) >> Load to Tables >> new table >> Load file to new table >> (provide information) >> Load

Load file to new table


Of course, the table's name must be unique within the Schema and the further properties must define files' definition. 

One should consider loading first a couple of tables, performing a rough validation of the data imported, and only after that the remaining tables can be imported. This allows to identify the issues that typically lead to reimports of the data (wrong formatting, invalid column names, duplicated files, etc.) or rework.

If the files have different characteristics (e.g. delimiters, number of attributes/records, special data types), one should consider this in the initial scope and have at least one example from each category. 

Review the Metadata

Once the files were made available, there's the tendency to start directly with the development without analyzing the data, or equally important, the metadata available. To review the metadata of the tables newly created, one can use the objects from the standard INFORMATION_SCHEMA (see post):

-- retrieve the list of tables
SELECT * 
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'orders'
ORDER BY TABLE_SCHEMA  

Further on, one can review columns' definition:
 
-- retrieve column metadata
SELECT TABLE_CATALOG
, TABLE_SCHEMA
, TABLE_NAME
, COLUMN_NAME
, ORDINAL_POSITION
, DATA_TYPE
, CHARACTER_MAXIMUM_LENGTH
, NUMERIC_PRECISION
, NUMERIC_SCALE
, DATETIME_PRECISION
, CHARACTER_SET_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'orders'
ORDER BY ORDINAL_POSITION

It's a good idea to save the metadata to a file and use it later for reviews, respectively for metadata management, when no other solution is in place for the same (e.g. Purview). That's useful also for the people with limited or no access to the workspace. 

Alternatively, one can use a notebook with the following SQL-based commands: 

%%sql

DESCRIBE TABLE LH_SCM_PoC.orders.sales;

DESCRIBE TABLE EXTENDED LH_SCM_PoC.orders.sales;

One can even provide meaningful descriptions for each table and its attributes via scripts like the ones below, however it might be a good idea to do this in the later phases of the PoC, when the logic become stable:

%%sql

-- modify a table's COMMENT
COMMENT ON TABLE LH_SCM_PoC.orders.sales IS 'Customer sales orders';

-- modify columns' COMMENT for an existing table
ALTER TABLE LH_SCM_DWH.orders.sales  
ALTER COLUMN SalesOrderNumber COMMENT 'Sales Order Number';

Data Validation

Before diving into building any business logic, besides identifying the primary, foreign keys and further attributes used in bringing the data together, it's recommended to get an overview of data's intrinsic and extrinsic characteristics relevant to the analysis. Some of the rules used typically for studying the quality of data apply to some extent also in here, though one needs to prioritize accordingly, otherwise one replicates the effort that's typically part of the Data Management initiatives. 

In addition, it's important to check how much the identified issues impact the business logic, respectively on whether the issues can be corrected to match the expectations. Often, no logic can compensate for major  data quality issues, and this can also affect  PoC's results as soon as the outcomes are validated against the expectations! 

Data Understanding 

Further on, it makes sense to get a high-level understanding of the data by looking at the distribution of values, respectively at the records participating in the joins. Of course, more similar queries can be built, though again, one should try to focus on the most important aspects!

The analysis could for example consider the following points:

/* validation of Products */

-- review duplicated product numbers (should be 0)
SELECT ProductName
, count(*) RecordCount
FROM orders.products
GROUP BY ProductName
HAVING count(*)>1

-- review most (in)expensive products
SELECT top 100 ProductID
, ProductName
, Category
, ListPrice 
FROM orders.products
ORDER BY ListPrice DESC --ASC

-- review category distribution
SELECT Category
, count(*) RecordCount 
FROM orders.products
GROUP BY Category
ORDER BY RecordCount DESC

-- review price ranges (
SELECT Len(floor(ListPrice)) RangeCount
, count(*) RecordCount 
FROM orders.products
GROUP BY Len(floor(ListPrice)) 
ORDER BY RangeCount DESC

/* validation of Customers */

-- duplicated email address 
SELECT CST.CustomerID
, CST.FirstName
, CST.LastName 
, CST.EmailAddress 
, DUP.RecordCount
FROM (-- duplicates
	SELECT EmailAddress
	, count(*) RecordCount 
	FROM orders.customers 
	GROUP BY EmailAddress 
	HAVING count(*)>1
	) DUP
	JOIN orders.customers CST
	   ON DUP.EmailAddress = CST.EmailAddress
ORDER BY DUP.RecordCount DESC
, DUP.EmailAddress 

-- duplicated Customer names (not necessarily duplicates)
SELECT CST.CustomerID
, CST.FirstName
, CST.LastName 
, CST.EmailAddress 
, DUP.RecordCount
FROM (-- duplicates
	SELECT FirstName
	, LastName
	, count(*) RecordCount 
	FROM orders.customers 
	GROUP BY FirstName
	, LastName 
	HAVING count(*)>1
	) DUP
	JOIN orders.customers CST
	   ON DUP.FirstName = CST.FirstName
      AND DUP.LastName = CST.LastName
ORDER BY DUP.RecordCount DESC
, DUP.FirstName
, DUP.LastName

/* validation of Orders */

-- review a typical order
SELECT SalesOrderID
, OrderDate
, CustomerID
, LineItem
, ProductID
, OrderQty
, LineItemTotal
FROM orders.orders
WHERE SalesOrderID = 71780
ORDER BY SalesOrderID 
, LineItem

-- review orders' distribution by month
SELECT Year(OrderDate) Year
, Month(OrderDate) Month
, count(*) RecordCount
FROM orders.orders
GROUP BY Year(OrderDate) 
, Month(OrderDate) 
ORDER BY Year
, Month

-- checking for duplicates
SELECT SalesOrderID
, LineItem
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY SalesOrderID
, LineItem
HAVING count(*)>1

-- checking for biggest orders
SELECT SalesOrderID
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY SalesOrderID
HAVING count(*) > 10
ORDER BY NoRecords DESC

-- checking for most purchased products
SELECT ProductID
, count(*) NoRecords
FROM orders.orders ord 
GROUP BY ProductID
HAVING count(*) > 8
ORDER BY NoRecords DESC

-- checking for most active customers
SELECT CustomerID
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY CustomerID
HAVING count(*) > 10
ORDER BY RecordCount DESC

/* join checks */

-- Prders without Product (should be 0)
SELECT count(*) RecordCount
FROM orders.orders ord 
	 LEFT JOIN orders.products prd
	   ON ord.ProductID = prd.ProductID
WHERE prd.ProductID IS NULL

-- Prders without Customer (should be 0)
SELECT count(*) RecordCount
FROM orders.orders ORD 
	 LEFT JOIN orders.customers CST
	   ON ORD.CustomerID = CST.CustomerID
WHERE CST.CustomerID IS NULL

-- Products without Orders (153 records)
SELECT count(*) RecordCount
FROM orders.products prd
	 LEFT JOIN orders.orders ord 
	   ON prd.ProductID = ord.ProductID 
WHERE ord.ProductID IS NULL


-- Customers without Orders (815 records)
SELECT count(*) RecordCount
FROM orders.customers CST
	 LEFT JOIN orders.orders ORD
	   ON ORD.CustomerID = CST.CustomerID
WHERE ORD.CustomerID IS NULL

The more tables are involved, the more complex the validation logic can become. One should focus on the most important aspects.

Building the Logic

Once one has an acceptable understanding of the data entities involved and the relation between them, it's time to build the needed business logic by joining the various tables at the various levels of detail. One can focus on the minimum required, respectively attempt to build a general model that can address a broader set of requirements. For the PoC it's usually recommended to start small by addressing the immediate requirements, though some flexibility might be needed for exploring the data and preparing the logic for a broader set of requirements. Independently of the scope, one should consider a set of validations. 

Usually, it makes sense to encapsulate the logic in several views or table-valued functions that reflect the logic for the main purposes and which allow a high degree of reuse (see [1]). Of course, one can use the standard approach for modelling the bronze, silver, respectively the gold layers adopted by many professionals. For a PoC, even if  that's not mandatory, it might still be a good idea to make steps in the respective direction. 

In this case, dealing with only three tables - a fact table and two dimensions table - there are several perspectives that can be built:

a) all records from fact table + dimension records

The following view provides the lowest level of details for the fact table, allowing thus to look at the data from different perspectives as long as focus is only the values used is Sales Orders:

-- create the view
CREATE OR ALTER VIEW orders.vSalesOrders
-- Sales Orders with Product & Customer information
AS
SELECT ORD.SalesOrderID
, ORD.OrderDate
, ORD.CustomerID
, CST.FirstName 
, CST.LastName
, CST.EmailAddress
, ORD.LineItem
, ORD.ProductID
, PRD.ProductName 
, PRD.Category
, ORD.OrderQty
, ORD.LineItemTotal
, PRD.ListPrice 
, ORD.OrderQty * PRD.ListPrice ListPriceTotal
FROM orders.orders ORD 
	 JOIN orders.products PRD
	   ON ORD.ProductID = PRD.ProductID
	 JOIN orders.customers CST
	   ON ORD.CustomerID = CST.CustomerID

-- test the view   
SELECT *
FROM orders.vSalesOrders
WHERE SalesOrderID = 71780

One can use full joins unless some of the references dimensions are not available.  

b) aggregated data for all dimension combinations

The previous view allows to aggregate the data at the various levels of details:

-- Sales volume by Customer & Product
SELECT ORD.EmailAddress
, ORD.ProductName 
, ORD.Category
, SUM(ORD.OrderQty) OrderQty
, SUM(ORD.LineItemTotal) LineItemTotal
FROM orders.vSalesOrders ORD 
WHERE ORD.OrderDate >= '2022-06-01'
  AND ORD.OrderDate < '2022-07-01'
GROUP BY ORD.EmailAddress
, ORD.ProductName 
, ORD.Category
ORDER BY ORD.EmailAddress
, ORD.ProductName 

One can comment out the dimensions not needed. The query can be included in a view as well. 

c) all records from each dimension table + aggregated fact records

Sometimes, it's useful to look at the data from a dimension's perspective, though it might be needed to create such an object for each dimension, like in the below examples. For the maximum of flexibility the logic can be included in a table-valued function:

-- create the user-defined function
CREATE OR ALTER FUNCTION orders.tvfProductsSalesVolume(
    @StartDate date NULL,
    @EndDate date NULL
)
RETURNS TABLE
-- Sales volume by Product
AS
RETURN (
SELECT PRD.ProductID
, PRD.ProductName 
, PRD.Category
, ORD.FirstOrderDate
, ORD.LastOrderDate 
, IsNull(ORD.TotalSalesQty, 0) TotalSalesQty 
, IsNull(ORD.TotalSalesValue, 0) TotalSalesValue
, IsNull(ORD.OrderCount, 0) OrderCount
, IsNull(ORD.LineCount, 0) LineCount
FROM orders.products PRD
     OUTER APPLY (
		SELECT Min(ORD.OrderDate) FirstOrderDate
		, Max(ORD.OrderDate) LastOrderDate 
		, SUM(ORD.OrderQty) TotalSalesQty
		, SUM(ORD.LineItemTotal) TotalSalesValue
		, count(DISTINCT SalesOrderID) OrderCount
		, count(*) LineCount
		FROM orders.orders ORD 
		WHERE ORD.ProductID = PRD.ProductID
		  AND ORD.OrderDate >= @StartDate 
		  AND ORD.OrderDate < @EndDate 
	 ) ORD
);

-- test the user-defined function
SELECT *
FROM orders.tvfProductsSalesVolume('2022-06-01','2022-07-01') PRD
WHERE TotalSalesValue <> 0
ORDER BY TotalSalesValue DESC
, LastOrderDate DESC


-- create the user-defined function
CREATE OR ALTER FUNCTION orders.tvfCustomersSalesVolume(
    @StartDate date NULL,
    @EndDate date NULL
)
RETURNS TABLE
-- Sales volume by Customer
AS
RETURN (
SELECT CST.CustomerID
, CST.FirstName 
, CST.LastName
, CST.EmailAddress
, ORD.FirstOrderDate
, ORD.LastOrderDate 
, IsNull(ORD.TotalSalesValue, 0) TotalSalesValue
, IsNull(ORD.OrderCount, 0) OrderCount
, IsNull(ORD.LineCount, 0) LineCount
FROM orders.customers CST
     OUTER APPLY (
		SELECT Min(ORD.OrderDate) FirstOrderDate
		, Max(ORD.OrderDate) LastOrderDate 
		, SUM(ORD.LineItemTotal) TotalSalesValue
		, count(DISTINCT SalesOrderID) OrderCount
		, count(*) LineCount
		FROM orders.orders ORD 
		WHERE ORD.CustomerID = CST.CustomerID
		  AND ORD.OrderDate >= @StartDate 
		  AND ORD.OrderDate < @EndDate 
	 ) ORD
);

-- test the user-defined function
SELECT *
FROM orders.tvfCustomersSalesVolume('2022-06-01','2022-07-01') PRD
WHERE TotalSalesValue <> 0
ORDER BY TotalSalesValue DESC
, LastOrderDate DESC

When restructuring the queries in similar ways, there's always a compromise between the various factors: (re)usability, performance or completeness. 

Further Comments

The above database objects should allow users to address most of the requirements, though, as usual, there can be also exceptions, especially when the data needs to be aggregated at a different level of detail that requires the query to be structured differently.

The number of perspectives can increase also with the number of fact tables used to model a certain entity (e.g. Sales order headers vs. lines). For example, 

In theory, one can also find ways to automate the process of creating database objects, though one must choose the relevant attributes, respectively include logic that makes sense only within a certain perspective. 

No matter the data, respectively systems used as source, expect surprises and test your assumptions! For example, in the file used to create the orders.customers table, there seem to be duplicated entities with the same name and email address. One must clarify how such entities must be handled in data analysis, respectively in data modeling. For example, a person can appear twice because of the roles associated with the name or can be other entitled reasons. 

The files in scope of this post are small compared with the files existing in organizations. In many scenarios files' size could range from GB to TB and thus require partitioning and different other strategies. 

|>> Next Post

References
[1] sql-troubles (2023) Architecture Part IV: Building a Modern Data Warehouse with Azure Synapse [link]

Resources
[1] Microsoft Learn (2024) Fabric: Lakehouse and Delta Lake tables [link]

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

As shown in the previous two posts, Copilot helps to generate code even if this may involve multiple prompts and iterations. However, it works and probably new beginners can learn a lot from the code generated in the process. The current post is intended to provide some scenarios you can try in Copilot or any other prompt-based AI.

Disclaimer: The below code was generated by Copilot via prompts for demonstration purposes!

Use Case 1: Extracting Concatenated Emails 

Sometimes one needs to work with distribution lists (list of emails delimited usually by ";"), following to extract the individual emails for the various purposes (e.g. validation, user's identification, etc.). The simplest solution is to use the STRING_SPLIT function introduced in SQL Server 2016, while previously the solutions involved parsing the sting within a loop as part of a stored procedure or user-defined function (UDF). When there's no adequate formatting, it can still be feasible to parse strings within loops. 

Prompt:
Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the procedure. 

Output:

-- stored procedure generated by Copilot
CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Create a table to store the individual emails
    CREATE TABLE #EmailTable (Email NVARCHAR(255));

    -- Split the email list and insert into the table
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO #EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END


    -- Return the result set
    SELECT Email FROM #EmailTable;

    -- Clean up
    DROP TABLE #EmailTable;
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

Prompt:
Create a user-defined function that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the function.

-- create the function
CREATE FUNCTION dbo.SplitEmails (@EmailList NVARCHAR(MAX))
RETURNS @EmailTable TABLE (Email NVARCHAR(255))
AS
BEGIN
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO @EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END

    RETURN;
END;


-- Test the SplitEmails function
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

SELECT * FROM dbo.SplitEmails(@EmailList);

From the two solutions probably the table-valued function (TVF) offers the most flexibility as it can be easier integrated in further logic. It would be interesting to also perform some performance texts. As long as the data volume is neglectable, an UDF is the way to go! 

However, one can directly use the String_Split  function. One can wonder why the function wasn't used in the first place?!

Prompt:

Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails using the String_Split functio. Write the code to test the procedure. 

Output:

CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Use STRING_SPLIT to split the email list into individual emails
    SELECT value AS Email
    FROM STRING_SPLIT(@EmailList, ';');
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

This is probably a good example to show that one should start with a narrow question and expand it as needed. Though, this depends also on the problem to solve.

Use Case 2: Concatenate Emails from Table

It's often needed to concatenate the emails for a Customer or other entity to send standard emails to the parties associated with the respective emails. I wrote over the years several posts on the topic, long before the STRING_AGG function was introduced in SQL Server 2017. Even if the STRING_AGG function can be called directly without the need to use other scaffolding, it was used for its simplicity instead of doing a concatenation.

In the below examples one can see how a small change in the prompt can have an important impact on the solution generated.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table. Provide the code to test the function.

Output:

-- create the user defined function
CREATE FUNCTION dbo.ConcatenateEmails()
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM YourEmailTable;

    RETURN @EmailList;
END;

-- test the function
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmails() AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table provided as parameter. Provide the code to test the function.

Output: 

CREATE FUNCTION dbo.ConcatenateEmailsFromSP(@TableName NVARCHAR(128))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    DECLARE @EmailList NVARCHAR(MAX);

    SET @SQL = 'SELECT @EmailList = STRING_AGG(Email, '';'') FROM ' + @TableName;

    EXEC sp_executesql @SQL, N'@EmailList NVARCHAR(MAX) OUTPUT', @EmailList OUTPUT;

    RETURN @EmailList;
END;

-- testing the code
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');


-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmailsFromSP('YourEmailTable') AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Running the create statements leads however to the following error message:

"Msg 557, Level 16, State 2, Line 3, Only functions and some extended stored procedures can be executed from within a function."

At least in the previous versions of SQL Server, sp_executesql couldn't be executed from a UDF. Even if it's possible to write such code, it will not work, as the example shows! It's to wonder what led to this solution. 

Prompt:

Create a function in SQL Server that concatenates in a list the emails from a table variable provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromTableVar(@EmailTable TABLE (Email NVARCHAR(255)))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;

Executing the code results in the following error message:

"Msg 156, Level 15, State 1, Line 1, Incorrect syntax near the keyword 'TABLE'."

Checking the documentation, the following limitation should be observed [1]:  

"Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine." [1]

The correct use should be observed in the next example.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table-valued parameter. Provide the code to test the function.

Output:

-- create a table type for the table-valued parameter:
CREATE TYPE EmailTableType AS TABLE 
(
    Email NVARCHAR(255)
)

-- create the table-valued function
CREATE FUNCTION dbo.ConcatenateEmailsFromTVP(@EmailTable EmailTableType READONLY)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;


-- Declare a table variable of the table type and insert some test data
DECLARE @EmailTable EmailTableType;

INSERT INTO @EmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmailsFromTVP function
SELECT dbo.ConcatenateEmailsFromTVP(@EmailTable) AS EmailList;

Final Thoughts

See a previous post for the uses of STRING_SPLIT and STRING_AGG complementary functions. If one doesn't know what is doing, it's easy to go on the wrong path(s) or eventually choose suboptimal solutions even when assisted by AI. If blunders can happen for simple cases, what will happen when the size of the solutions is hundreds and thousands times more complex? How will the blunders multiply?

As it can be seen from the not working examples, Copilot has more problems besides hallucinations. Even if semantically some solutions are viable, the code might break in contact with reality! Moreover, even if something works, one needs to weigh solutions in terms of performance and use specific techniques to address them! AI is more likely feasible in working with high-level programming languages like C++ or Java that hopefully offer fewer chances for mistakes.

Once the code templates are generated, the code can be extended as needed. Frankly, even if the code is quite basic, the result is impressive considered through the amount of work involved to generate it. One can but wonder whether the code was taken from a source, part of the training, respectively part of the AI heuristics. It can probably be a combination of all.

Even if Copilot generated the template, the developers still need to be able to implement it in the broader solution, which sometimes might be more challenging than expected, especially when the details matter. The devil is in the details after all! Developers must also recognize which solution is more appropriate in terms of flexibility, maintenance, performance or whatever criteria may apply. I was wondering whether one can change the code to a degree that one can't recognize anymore that it was written by Copilot. Any ideas?

It's to be expected that in the near future such tools will be able to provide more complex solutions. I will not say that programming is dead, as some proclaim, though that will shift its focus in the area of prompt engineering. Somebody, and I mean here the programmer or other technical person, must be able to evaluate critically and qualitatively the results, respectively identify the blunders and other types of issues resulting in the process!

Happy coding!

Previous Post <<||>> Next Post 

References:
[1] Microsoft Learn (2024) SQL Server 2022: Use table-valued parameters [link]

Acronyms:
DML - Data Manipulation Language
TVF - Table-Valued Function
UDF - User-Defined Function

23 February 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part X: Templates for Database Objects)

One of the new features remarked in SQL databases when working on the previous post is the availability of templates in SQL databases. The functionality is useful even if is kept to a minimum. Probably, more value can be obtained when used in combination with Copilot, which requires at least a F12 capacity.

Schemas

Schemas are used to create a logical grouping of objects such as tables, stored procedures, and functions. From a structural and security point of view it makes sense to create additional schemas to manage the various database objects and use the default dbo schema only occasionally (e.g. for global created objects).

-- generated template - schema
CREATE SCHEMA SchemaName

-- create schema
CREATE SCHEMA Test

One can look at the sys.schemas to retrieve all the schemas available:

-- retrieve all schemas
SELECT schema_id
, name
, principal_id
FROM sys.schemas
ORDER BY schema_id

Tables

Tables, as database objects that contain all the data in a database are probably the elements that need the greatest attention in design and data processing. In some cases a table can be dedenormalized and it can store all the data needed, much like in MS Excel, respectively, benormalized in fact and dimension tables. 

Tables can be created explicitly by defining in advance their structure (see Option 1), respectively on the fly (see Option 2). 

-- Option 1
-- create the table manually (alternative to precedent step
CREATE TABLE [Test].[Customers](
	[CustomerId] [int] NOT NULL,
	[AddressID] [int] NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NULL,
	[CompanyName] [nvarchar](128) NULL,
	[SalesPerson] [nvarchar](256) NULL
) ON [PRIMARY]
GO

-- insert records
INSERT INTO Test.Customers
SELECT CustomerId
, Title
, FirstName 
, LastName
, CompanyName
, SalesPerson
FROM SalesLT.Customer -- checking the output (both scenarios) SELECT top 100 * FROM Test.Customers

One can look at the sys.tables to retrieve all the tables available:

-- retrieve all tables
SELECT schema_name(schema_id) schema_name
, object_id
, name
FROM sys.tables
ORDER BY schema_name
, name

Views

Views are much like virtual table based on the result-set of an SQL statement that combines data from one or multiple tables.  They can be used to encapsulate logic, respectively project horizontally or  vertically a subset of the data. 

-- create view
CREATE OR ALTER VIEW Test.vCustomers
-- Customers 
AS
SELECT CST.CustomerId 
, CST.Title
, CST.FirstName 
, IsNull(CST.MiddleName, '') MiddleName
, CST.LastName 
, CST.CompanyName 
, CST.SalesPerson 
FROM SalesLT.Customer CST

-- test the view 
SELECT *
FROM Test.vCustomers
WHERE CompanyName = 'A Bike Store'

One can look at the sys.views to retrieve all the views available:

-- retrieve all views
SELECT schema_name(schema_id) schema_name
, object_id
, name
FROM sys.views
ORDER BY schema_name
, name

User-Defined Functions

A user-defined function (UDF) allows to create a function by using a SQL expression. It can be used alone or as part of a query, as in the below example.

-- generated template - user defined function
CREATE FUNCTION [dbo].[FunctionName] (
    @param1 INT,
    @param2 INT
)
RETURNS INT AS BEGIN RETURN
    @param1 + @param2
END

-- user-defined function: 
CREATE OR ALTER FUNCTION Test.GetFirstMiddleLastName (
    @FirstName nvarchar(50),
    @MiddleName nvarchar(50),
    @LastName nvarchar(50)
)
RETURNS nvarchar(150) AS 
BEGIN 
   RETURN IsNull(@FirstName, '') + IsNull(' ' + @MiddleName, '') + IsNull(' ' + @LastName, '') 
END

-- test UDF on single values
SELECT Test.GetFirstMiddleLastName ('Jack', NULL, 'Sparrow')
SELECT Test.GetFirstMiddleLastName ('Jack', 'L.', 'Sparrow')

-- test UDF on a whole table
SELECT TOP 100 Test.GetFirstMiddleLastName (FirstName, MiddleName, LastName)
FROM SalesLT.Customer

One can look at the sys.objects to retrieve all the scalar functions available:

-- retrieve all scalar functions
SELECT schema_name(schema_id) schema_name
, name
, object_id
FROM sys.objects 
WHERE type_desc = 'SQL_SCALAR_FUNCTION'
ORDER BY schema_name
, name

However, UDFs prove to be useful when they mix the capabilities of functions with the ones of views allowing to create a "parametrized view" (see next example) or even encapsulate a multi-line statement that returns a dataset. Currently, there seems to be no template available for creating such functions.

-- table-valued function
CREATE OR ALTER FUNCTION Test.tvfGetCustomers (
    @CompanyName nvarchar(50) NULL
)
RETURNS TABLE
-- Customers by Company
AS
RETURN (
	SELECT CST.CustomerId 
	, CST.CompanyName
	, CST.Title
	, IsNull(CST.FirstName, '') + IsNull(' ' + CST.MiddleName, '') + IsNull(' ' + CST.LastName, '') FullName
	, CST.FirstName 
	, CST.MiddleName 
	, CST.LastName 
	FROM SalesLT.Customer CST
	WHERE CST.CompanyName = IsNull(@CompanyName, CST.CompanyName)
);

-- test function for values
SELECT *
FROM Test.tvfGetCustomers ('A Bike Store')
ORDER BY CompanyName
, FullName

-- test function for retrieving all values
SELECT *
FROM Test.tvfGetCustomers (NULL)
ORDER BY CompanyName
, FullName

One can look at the sys.objects to retrieve all the table-valued functions available:

-- retrieve all table-valued functions
SELECT schema_name(schema_id) schema_name
, name
, object_id
FROM sys.objects 
WHERE type_desc = 'SQL_INLINE_TABLE_VALUED_FUNCTION'
ORDER BY schema_name , name

Stored Procedures

A stored procedure is a prepared SQL statement that is stored as a database object and precompiled. Typically, the statements considered in SQL functions can be created also as stored procedure, however the latter doesn't allow to reuse the output directly.

-- get customers by company
CREATE OR ALTER PROCEDURE Test.spGetCustomersByCompany (
    @CompanyName nvarchar(50) NULL
)
AS
BEGIN
	SELECT CST.CustomerId 
	, CST.CompanyName
	, CST.Title
	, IsNull(CST.FirstName, '') + IsNull(' ' + CST.MiddleName, '') + IsNull(' ' + CST.LastName, '') FullName
	, CST.FirstName 
	, CST.MiddleName 
	, CST.LastName 
	FROM SalesLT.Customer CST
	WHERE CST.CompanyName = IsNull(@CompanyName, CST.CompanyName)
	ORDER BY CST.CompanyName
	, FullName
END 

-- test the procedure 
EXEC Test.spGetCustomersByCompany NULL -- all customers
EXEC Test.spGetCustomersByCompany 'A Bike Store' -- individual customer

One can look at the sys.objects to retrieve all the stored procedures available:

-- retrieve all scalar functions
SELECT schema_name(schema_id) schema_name
, name
, object_id
FROM sys.objects 
WHERE type_desc = 'SQL_STORED_PROCEDURE'
ORDER BY schema_name , name

In the end, don't forget to drop the objects created above (note the order of the dependencies):

-- drop function 
DROP FUNCTION IF EXISTS Test.GetFirstMiddleLastName

-- drop function 
DROP FUNCTION IF EXISTS Test.tvfGetCustomers 
-- drop precedure DROP VIEW IF EXISTS Test.Test.spGetCustomersByCompany -- drop view DROP VIEW IF EXISTS Test.vCustomers -- drop schema DROP SCHEMA IF EXISTS Test

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: Overview of Copilot in Fabric [link]

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.