SQL Troubles: concatenation

Showing posts with label concatenation. Show all posts

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

As shown in the previous two posts, Copilot helps to generate code even if this may involve multiple prompts and iterations. However, it works and probably new beginners can learn a lot from the code generated in the process. The current post is intended to provide some scenarios you can try in Copilot or any other prompt-based AI.

Disclaimer: The below code was generated by Copilot via prompts for demonstration purposes!

Use Case 1: Extracting Concatenated Emails

Sometimes one needs to work with distribution lists (list of emails delimited usually by ";"), following to extract the individual emails for the various purposes (e.g. validation, user's identification, etc.). The simplest solution is to use the STRING_SPLIT function introduced in SQL Server 2016, while previously the solutions involved parsing the sting within a loop as part of a stored procedure or user-defined function (UDF). When there's no adequate formatting, it can still be feasible to parse strings within loops.

Prompt:
Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the procedure.

Output:

-- stored procedure generated by Copilot
CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Create a table to store the individual emails
    CREATE TABLE #EmailTable (Email NVARCHAR(255));

    -- Split the email list and insert into the table
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO #EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END


    -- Return the result set
    SELECT Email FROM #EmailTable;

    -- Clean up
    DROP TABLE #EmailTable;
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

Prompt:
Create a user-defined function that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails. Write the code to test the function.

-- create the function
CREATE FUNCTION dbo.SplitEmails (@EmailList NVARCHAR(MAX))
RETURNS @EmailTable TABLE (Email NVARCHAR(255))
AS
BEGIN
    DECLARE @Pos INT, @NextPos INT, @Email NVARCHAR(255);
    SET @Pos = 1;
    SET @EmailList = @EmailList + ';'; -- Add a delimiter at the end

    WHILE CHARINDEX(';', @EmailList, @Pos) > 0
    BEGIN
        SET @NextPos = CHARINDEX(';', @EmailList, @Pos);
        SET @Email = SUBSTRING(@EmailList, @Pos, @NextPos - @Pos);
        INSERT INTO @EmailTable (Email) VALUES (@Email);
        SET @Pos = @NextPos + 1;
    END

    RETURN;
END;


-- Test the SplitEmails function
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

SELECT * FROM dbo.SplitEmails(@EmailList);

From the two solutions probably the table-valued function (TVF) offers the most flexibility as it can be easier integrated in further logic. It would be interesting to also perform some performance texts. As long as the data volume is neglectable, an UDF is the way to go!

However, one can directly use the String_Split function. One can wonder why the function wasn't used in the first place?!

Prompt:

Create a stored procedure that takes a list of emails delimited by a set of semicolons and returns them as a dataset of individual emails using the String_Split functio. Write the code to test the procedure.

Output:

CREATE PROCEDURE SplitEmails
    @EmailList NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    -- Use STRING_SPLIT to split the email list into individual emails
    SELECT value AS Email
    FROM STRING_SPLIT(@EmailList, ';');
END;

-- Test the SplitEmails stored procedure
DECLARE @EmailList NVARCHAR(MAX);
SET @EmailList = 'email1@example.com;email2@example.com;email3@example.com';

EXEC SplitEmails @EmailList;

This is probably a good example to show that one should start with a narrow question and expand it as needed. Though, this depends also on the problem to solve.

Use Case 2: Concatenate Emails from Table

It's often needed to concatenate the emails for a Customer or other entity to send standard emails to the parties associated with the respective emails. I wrote over the years several posts on the topic, long before the STRING_AGG function was introduced in SQL Server 2017. Even if the STRING_AGG function can be called directly without the need to use other scaffolding, it was used for its simplicity instead of doing a concatenation.

In the below examples one can see how a small change in the prompt can have an important impact on the solution generated.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table. Provide the code to test the function.

Output:

-- create the user defined function
CREATE FUNCTION dbo.ConcatenateEmails()
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM YourEmailTable;

    RETURN @EmailList;
END;

-- test the function
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmails() AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromSP(@TableName NVARCHAR(128))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    DECLARE @EmailList NVARCHAR(MAX);

    SET @SQL = 'SELECT @EmailList = STRING_AGG(Email, '';'') FROM ' + @TableName;

    EXEC sp_executesql @SQL, N'@EmailList NVARCHAR(MAX) OUTPUT', @EmailList OUTPUT;

    RETURN @EmailList;
END;

-- testing the code
-- Create a sample table and insert some test data
CREATE TABLE YourEmailTable (Email NVARCHAR(255));

INSERT INTO YourEmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');


-- Test the ConcatenateEmails function
SELECT dbo.ConcatenateEmailsFromSP('YourEmailTable') AS EmailList;

-- Clean up the sample table
DROP TABLE YourEmailTable;

Running the create statements leads however to the following error message:

"Msg 557, Level 16, State 2, Line 3, Only functions and some extended stored procedures can be executed from within a function."

At least in the previous versions of SQL Server, sp_executesql couldn't be executed from a UDF. Even if it's possible to write such code, it will not work, as the example shows! It's to wonder what led to this solution.

Prompt:

Create a function in SQL Server that concatenates in a list the emails from a table variable provided as parameter. Provide the code to test the function.

Output:

CREATE FUNCTION dbo.ConcatenateEmailsFromTableVar(@EmailTable TABLE (Email NVARCHAR(255)))
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;

Executing the code results in the following error message:

"Msg 156, Level 15, State 1, Line 1, Incorrect syntax near the keyword 'TABLE'."

Checking the documentation, the following limitation should be observed [1]:

"Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine." [1]

The correct use should be observed in the next example.

Prompt:
Create a function in SQL Server that concatenates in a list the emails from a table-valued parameter. Provide the code to test the function.

Output:

-- create a table type for the table-valued parameter:
CREATE TYPE EmailTableType AS TABLE 
(
    Email NVARCHAR(255)
)

-- create the table-valued function
CREATE FUNCTION dbo.ConcatenateEmailsFromTVP(@EmailTable EmailTableType READONLY)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    DECLARE @EmailList NVARCHAR(MAX);

    SELECT @EmailList = STRING_AGG(Email, ';')
    FROM @EmailTable;

    RETURN @EmailList;
END;


-- Declare a table variable of the table type and insert some test data
DECLARE @EmailTable EmailTableType;

INSERT INTO @EmailTable (Email) VALUES 
('email1@example.com'),
('email2@example.com'),
('email3@example.com');

-- Test the ConcatenateEmailsFromTVP function
SELECT dbo.ConcatenateEmailsFromTVP(@EmailTable) AS EmailList;

Final Thoughts

See a previous post for the uses of STRING_SPLIT and STRING_AGG complementary functions. If one doesn't know what is doing, it's easy to go on the wrong path(s) or eventually choose suboptimal solutions even when assisted by AI. If blunders can happen for simple cases, what will happen when the size of the solutions is hundreds and thousands times more complex? How will the blunders multiply?

As it can be seen from the not working examples, Copilot has more problems besides hallucinations. Even if semantically some solutions are viable, the code might break in contact with reality! Moreover, even if something works, one needs to weigh solutions in terms of performance and use specific techniques to address them! AI is more likely feasible in working with high-level programming languages like C++ or Java that hopefully offer fewer chances for mistakes.

Once the code templates are generated, the code can be extended as needed. Frankly, even if the code is quite basic, the result is impressive considered through the amount of work involved to generate it. One can but wonder whether the code was taken from a source, part of the training, respectively part of the AI heuristics. It can probably be a combination of all.

Even if Copilot generated the template, the developers still need to be able to implement it in the broader solution, which sometimes might be more challenging than expected, especially when the details matter. The devil is in the details after all! Developers must also recognize which solution is more appropriate in terms of flexibility, maintenance, performance or whatever criteria may apply. I was wondering whether one can change the code to a degree that one can't recognize anymore that it was written by Copilot. Any ideas?

It's to be expected that in the near future such tools will be able to provide more complex solutions. I will not say that programming is dead, as some proclaim, though that will shift its focus in the area of prompt engineering. Somebody, and I mean here the programmer or other technical person, must be able to evaluate critically and qualitatively the results, respectively identify the blunders and other types of issues resulting in the process!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server 2022: Use table-valued parameters [link]

Acronyms:
DML - Data Manipulation Language
TVF - Table-Valued Function
UDF - User-Defined Function

08 February 2025

🌌🏭KQL Reloaded: First Steps (Part VIII: Translating SQL to KQL - Full Joins)

One of the great features of KQL is the possibility of translating SQL code to KQL via the "explain" keyword, allowing thus to port SQL code to KQL, respectively help translate knowledge from one programming language to another.

Let's start with a basic example:

// transform SQL to KQL code (to be run only the first part from --)
--
explain
SELECT top(10) CustomerKey, FirstName, LastName, CityName, CompanyName 
FROM Customers 
ORDER BY CityName DESC

// output: translated KQL code 
Customers
| project CustomerKey, FirstName, LastName, CityName, CompanyName
| sort by CityName desc nulls first
| take int(10)

The most interesting part of the translation is how "explain" translate joins from SQL to KQL. Let's start with a FULL JOIN from the set of patterns considered in a previous post on SQL joins:

--
explain
SELECT CST.CustomerKey
, CST.FirstName + ' ' + CST.LastName CustomerName
, Cast(SAL.DateKey as Date) DateKey
, SAL.TotalCost
FROM NewSales SAL
    JOIN Customers CST
      ON SAL.CustomerKey = CST.CustomerKey 
WHERE SAL.DateKey > '20240101' AND SAL.DateKey < '20240201'
ORDER BY CustomerName, DateKey, TotalCost DESC

And, here's the translation:

// translated code
NewSales
| project-rename ['SAL.DateKey']=DateKey
| join kind=inner (Customers
| project-rename ['CST.CustomerKey']=CustomerKey
    , ['CST.CityName']=CityName
    , ['CST.CompanyName']=CompanyName
    , ['CST.ContinentName']=ContinentName
    , ['CST.Education']=Education
    , ['CST.FirstName']=FirstName
    , ['CST.Gender']=Gender
    , ['CST.LastName']=LastName
    , ['CST.MaritalStatus']=MaritalStatus
    , ['CST.Occupation']=Occupation
    , ['CST.RegionCountryName']=RegionCountryName
    , ['CST.StateProvinceName']=StateProvinceName) 
    on ($left.CustomerKey == $right.['CST.CustomerKey'])
| where ((['SAL.DateKey'] > todatetime("20240101")) 
    and (['SAL.DateKey'] < todatetime("20240201")))
| project ['CST.CustomerKey']

    , CustomerName=__sql_add(__sql_add(['CST.FirstName']
    , " "), ['CST.LastName'])
    , DateKey=['SAL.DateKey']
    , TotalCost
| sort by CustomerName asc nulls first
    , DateKey asc nulls first
    , TotalCost desc nulls first
| project-rename CustomerKey=['CST.CustomerKey']

The code was slightly formatted to facilitated its reading. Unfortunately, the tool doesn't work well with table aliases, introduces also all the fields available from the dimension table, which can become a nightmare for the big dimension tables, the concatenation seems strange, and if one looks deeper, further issues can be identified. So, the challenge is how to write a query in SQL so it can minimize the further changed in QKL.

Probably, one approach is to write the backbone of the query in SQL and add the further logic after translation.

--
explain
SELECT NewSales.CustomerKey
, NewSales.DateKey 
, NewSales.TotalCost
FROM NewSales 
    INNER JOIN Customers 
      ON NewSales.CustomerKey = Customers.CustomerKey 
WHERE DateKey > '20240101' AND DateKey < '20240201'
ORDER BY NewSales.CustomerKey
, NewSales.DateKey

And the translation looks simpler:

// transformed query
NewSales
| join kind=inner 
(Customers
| project-rename ['Customers.CustomerKey']=CustomerKey
    , ['Customers.CityName']=CityName
    , ['Customers.CompanyName']=CompanyName
    , ['Customers.ContinentName']=ContinentName
    , ['Customers.Education']=Education
    , ['Customers.FirstName']=FirstName
    , ['Customers.Gender']=Gender
    , ['Customers.LastName']=LastName
    , ['Customers.MaritalStatus']=MaritalStatus
    , ['Customers.Occupation']=Occupation
    , ['Customers.RegionCountryName']=RegionCountryName
    , ['Customers.StateProvinceName']=StateProvinceName) 
    on ($left.CustomerKey == $right.['Customers.CustomerKey'])
| where ((DateKey > todatetime("20240101")) 
    and (DateKey < todatetime("20240201")))
| project CustomerKey, DateKey, TotalCost
| sort by CustomerKey asc nulls first
, DateKey asc nulls first

I would have written the query as follows:

// transformed final query
NewSales
| where (DateKey > todatetime("20240101")) 
    and (DateKey < todatetime("20240201"))
| join kind=inner (
    Customers
    | project CustomerKey
        , FirstName
        , LastName 
    ) on $left.CustomerKey == $right.CustomerKey
| project CustomerKey
    , CustomerName = strcat(FirstName, ' ', LastName)
    , DateKey
    , TotalCost
| sort by CustomerName asc nulls first
    , DateKey asc nulls first

So, it makes sense to create the backbone of a query, translate it to KQL via explain, remove the unnecessary columns and formatting, respectively add what's missing. Once the patterns were mastered, there's probably no need to use the translation tool, but could prove to be also some exceptions. Anyway, the translation tool helps considerably in learning. Big kudos for the development team!

Notes:
1) The above queries ignore the fact that Customer information is available also in the NewSales table, making thus the joins obsolete. The joins were considered only for exemplification purposes. Similar joins might be still needed for checking the "quality" of the data (e.g. for dimensions that change over time). Even if such "surprises" shouldn't appear by design, real life designs continue to surprise...
2) Queries should be less verbose by design (aka simplicity by design)! The more unnecessary code is added, the higher the chances for errors to be overseen, respectively the more time is needed to understand and validated the queries!

Happy coding!

Previous Post <<||>> Next Post

References
[1] Microsoft Lear n (2024) Azure: Query data using T-SQL [link]

04 February 2025

🌌🏭KQL Reloaded: First Steps (Part III: Basic Joins)

As data is usually dispersed over multiple tables, in any query languages is quintessential to know how to bring the data together for further analysis, joins allowing to achieve this is SQL as well in KQL. One usually starts with the main table and joins the further tables to it. In a data warehouse, the main table is a fact table to which the dimension tables are further joined to add details, respectively to filter on the dimensions.

Before starting to build any logic, it's important to get an idea of tables' cardinality and which are the keys on which the data to be join. Some models, like the one in the ContosoSales model, are straightforward and the columns participating in the joins have the same names, or at least similar names.

When writing and testing queries, one can start with a small subset of the data, join the tables together, and at the end validate the logic as needed. Selecting a small interval (e.g. a month, week, or even a day) and the records for 1-2 representative records from a dimensional table (e.g. Customers) helps in the process for minimizing the data movement and

// check how many records are available 
// Sales table: 28980 based on the filter
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()

// Products: 2517 records
Products
| summarize record_count = count()

// Customers: 18484 records
Customers 
| summarize record_count = count()

In a second step one can look at the individual values, identify uniqque indentifiers, the columns that participate in joins, respectively other columns of interest. If needed, one can five deeper by checking the number for individual values. Selecting 10-100 records is usually enough for getting an idea about the dataset, though there are also many exceptions:

// review the data
// Sales table: 28980 based on the filter
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| take 10

// Products: 2517 records
Products
| take 10

// Customers: 18484 records
Customers 
| take 10

In KQL one must start with the fact table (e.g NewSales) and add the dimensions accordingly. Alternatively, it's useful to group only by ProductKey, respectively by CustomerKey.

// review the basis output by Products & Customers
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
    , TotalCost = sum(TotalCost) by ProductKey, CustomerKey
| order by TotalCost desc
| summarize result_record_count = count()

Now let's join the Product dimension table to the NewSales fact table via the join operator:

// total by product via join
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
    , TotalCost = sum(TotalCost) by ProductKey
| join kind=inner (
    Products 
    | where  ProductCategoryName == 'TV and Video'
    )
    on ProductKey
| project ProductName, TotalCost
//| summarize record_count = count() 
| order by TotalCost desc

One can join now also the second dimension table and eventually select only the columns in scope, respectively do further calculations (see CustomerName):

// total by product & customer via joins
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
    , TotalCost = sum(TotalCost) by ProductKey, CustomerKey
| join kind=inner (
    Products 
    | where  ProductCategoryName == 'TV and Video'
    | project ProductKey, ProductName 
    )
    on ProductKey
| join kind=inner (
    Customers
    | project CustomerKey, CustomerName = strcat(FirstName, ' ', LastName)
    )
    on CustomerKey
| project CustomerName, ProductName, TotalCost
//| summarize record_count = count() 
| order by CustomerName asc, ProductName asc

Once the query built, one can go ahead and remove the pieces of logic not needed, an reevaluate the number of records with a count (see commented line).

Alternatively, one can rewrite the above query as follows via the lookup operators:

// total by Product & Customer via lookup
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
    , TotalCost = sum(TotalCost) by ProductKey, CustomerKey
| lookup Products on ProductKey
| lookup Customers on CustomerKey 
//| summarize record_count = count()

| project CustomerName = strcat(FirstName, ' ', LastName), ProductName, TotalCost

The two last queries should produce the same results. The first query allows more flexibility in what concerns the constraints applied, while the second is more compact while it also allows to easier test how the number of records changes by added each join. This set of tests is done to make sure that duplicates aren't created in the process, respectively that records are not removed unnecessarily from the logic.

If the fields used in joins don't have the same name, which happens in data many models, one can use the $left and $right to refer to the table participating in the join. This syntax makes the query a bit more verbose, though it brings more clarity. Even if some vendors provide similar syntax (e.g. USING in Oracle), one may prefer the syntax that offers clarity.

// total by product via lookup with table alias
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
    , TotalCost = sum(TotalCost) by ProductKey, CustomerKey
| lookup Products on $left.ProductKey == $right.ProductKey
| lookup Customers on $left.CustomerKey == $right.CustomerKey
| project CustomerName = strcat(FirstName, ' ', LastName), ProductName, TotalCost

Notes:
1) For readability purposes, it might be useful to use indentation for the lines that are continuation of the previous line. Therefore, if the line doesn't start with a pipe ("|"), the next line starts a few characters (a tab) further.

2) According to the documentation (see [1]), the "lookup" operator optimizes the performance of queries where a fact table is enriched with data from a dimension table.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Kusto Tutorial: Join data from multiple tables [link]

17 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VII (List of Values)

Introduction

Lists are one of the basic structures in Mathematics, the term referring to an (ordered) set of elements separated by comma, space or any other delimiter (e.g. “:”, “;”). The elements of a list can be numbers, words, functions, or any other type of objects. In the world of databases, a list is typically formed out of the values of a given column or a given record, however it could span also a combination of rows and records, is such cases two delimiters being needed – one for column and one for row. From here comes probably the denomination of list of values. In a more general accept a list of values could be regarded as a delimited/concatenated subset. Such lists are formed when needed to send the data between the layers of an application or applications, this type of encoding being quite natural. In fact, also the data in a database are stored in similar tabular delimited structure, more complex though.

An useful example in which the list of values are quite handy is the passing of multiple values within the parameter of stored procedure or function (see example). This supposes first building the list and then use the values in a dynamic build query (like in the before mentioned example) or by building a table on the fly. We can call the two operations composition, respectively decomposition of list of values.

Composition

Composition, whether on vertical or horizontal is nothing but a concatenation in which the values alternate with one or more delimiters. Let’s reconsider the concatenation based on the values of a Person.AddressType AdventureWorks table. As the logic for concatenating for one or more attributes is the same, the below example concatenates a list based on a single attribute, namely AddressTypeID in SingleList, respectively two attributes, AddressTypeID and Name.

-- concatenation of values across a table 
;WITH CTE (AddressTypeID, Name, Ranking) 
AS (--preparing the data       
     SELECT AddressTypeID  
     , Name 
     , ROW_NUMBER () OVER(ORDER BY Name) Ranking 
     FROM Person.AddressType 
     -- WHERE ... 
) 
, DAT (SingleList, DoubleList, Ranking) 
AS ( -- concatenating the values 
     SELECT Cast(AddressTypeID as varchar(max)) SingleList 
     , Cast('('+ Cast(AddressTypeID as varchar(10)) + ',''' + Name + ''')' as varchar(max)) DoubleList 
     , Ranking 
     FROM CTE 
     WHERE Ranking = 1 
     UNION ALL 
     SELECT DAT.SingleList + ',' + Cast(CTE.AddressTypeID as varchar(20)) SingleList 
    , Cast(DAT.DoubleList + ', ('+ Cast(CTE.AddressTypeID as varchar(10)) + ',''' + CTE.Name + ''')' as varchar(max)) DoubleList 
    , CTE.Ranking  
     FROM CTE          
       JOIN DAT           
          ON CTE.Ranking = DAT.Ranking + 1       
)  

-- the lists 
SELECT SingleList 
, DoubleList 
FROM DAT 
WHERE Ranking = (SELECT MAX(Ranking) FROM DAT)

The second example is based on atypical delimiters, resembling to the structure built for a batch insert or table value constructor-based statement, and as we’ll see later, ideal to be used in a dynamically-built query

Decomposition

Decomposition follows the inverse path, though it’s much easier to exemplify. In fact it’s used the same technique introduced in the last example from the previous post belonging to the same cycle, Subparts of a String, in which a space was used as delimiter. Another example is the dbo.SplitList function which decomposes a string using a loop.

-- decomposition of a string to a table using CTE 
CREATE FUNCTION dbo.StringToTable( 
 @str varchar(500) 
,@Delimiter char(1)) 
RETURNS @Temp TABLE ( 
Id int NOT NULL 
,Value varchar(50)) 
AS 
BEGIN  
     ;WITH CTE (PrevString, Position, Word)  
     AS (  
     SELECT LTrim(RTrim( CASE  
           WHEN CharIndex(@Delimiter, @str)>;0 THEN Right(@str, Len(@str)-CharIndex(@Delimiter, @str))  
           ELSE ''  
      END)) PrevString  
     , 1 Position  
     , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, @str)>0 THEN LEFT(@str, CharIndex(@Delimiter, @str)-1)  
           ELSE @str  
       END)) Word  
      UNION ALL  
      SELECT LTrim(RTrim(CASE  
            WHEN CharIndex(@Delimiter, PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(@Delimiter, PrevString))  
             ELSE ''  
       END)) PrevString  
      , Position + 1 Position  
      , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, PrevString)>0 THEN LEFT(PrevString, CharIndex(@Delimiter, PrevString)-1)  
          ELSE PrevString  
      END)) Word      FROM CTE  
     WHERE Len(PrevString)>0  
    )  
     INSERT @Temp(Id, Value) 
     SELECT Position  
     , Word      FROM CTE  
     OPTION (maxrecursion 100)  
     RETURN 
END

Here are two examples based on the single list created above and another one based on alphabet:

-- decomposing a list
SELECT Id 
, value 
FROM dbo.StringToTable('6,1,2,3,4,5', ',')     


-- decomposing the "alphabet" 
SELECT Id 
, value 
FROM dbo.StringToTable('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z', ',')

Even if the function deals only with a delimiter, it could be used to decompose lists involving multiple delimiters, as long the list is adequately built:

-- decomposing double list 
SELECT Id 
, value 
, Left(value, CHARINDEX(',', value)-1) LeftValue 
, Right(value, len(value)-CHARINDEX(',', value)) RightValue 
FROM dbo.StringToTable('6,Archive;1,Billing;2,Home;3,Main Office;4,Primary;5,Shipping', ';')

The tables built thus from list of values can be further used in queries when needed to create a table on the fly. It would be interesting maybe to show that the composition and decomposition are inverse functions, however that’s out of scope, at least for current set of posts.

SQL Troubles

Pages

27 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part IV: Extracting vs. Concatenating Emails)

08 February 2025

🌌🏭KQL Reloaded: First Steps (Part VIII: Translating SQL to KQL - Full Joins)

04 February 2025

🌌🏭KQL Reloaded: First Steps (Part III: Basic Joins)

17 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VII (List of Values)

Introduction

Composition

About Me