05 November 2022

SQL Reloaded: STRING_AGG and STRING_SPLIT at Work, and a Bit of Pivoting

Working with strings across records was for long a nightmare for SQL developers until Microsoft introduced STRING_SPLIT in SQL Server 2016, respectively STRING_AGG in SQL Server 2017.  Previously, one was forced to write procedural language or use workarounds until SQL Server 2015, when recursive CTEs (common table expressions), Ranking and PIVOT were introduced, which allowed handling many scenarios. 

Microsoft provides several examples for the usage of STRING_SPLIT and STRING_AGG functions based on AdventureWorks database, though let's look at another example based on the same database. 

Let's say we want to show the concatenated Contacts for a store, result which can now easily be obtained by using the STRING_AGG:

-- concatenating names per store via STRING_AGG (SQL Server 2017+)
SELECT BusinessEntityID
, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
FROM Sales.vStoreWithContacts
GROUP BY BusinessEntityID
HAVING count(*)>1

Observe that is needed to use a GROUP BY to show one record per Store. Unfortunately, there isn't yet a window function available for the same. 

The inverse operation can be performed with the help of STRING_SPLIT table-valued function (TVF). (If you wonder why is needed a TVF, it is because the initial record needs to be multiplied by the generated output.)

-- reversing the concatenation (SQL Server 2017+)
WITH CTE
AS (
	-- concatenating names per store
	SELECT BusinessEntityID
	, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
	FROM Sales.vStoreWithContacts
	GROUP BY BusinessEntityID
	HAVING count(*)>1
) 
SELECT CTE.BusinessEntityID
, DAT.Value
, DAT.Ordinal 
FROM CTE
    CROSS APPLY STRING_SPLIT(Contacts, ';', 1) DAT

STRING_SPLIT provides also an ordinal field, which can be used in theory in pivoting the values, though we'd return then from where we started. Instead of using the query just generated, let's exemplify an alternative solution which is available with SQL Server 2005 for concatenating strings across records:
 
-- concatenating names per store via PIVOT (SQL Server 2012+)
SELECT BusinessEntityID
, [1] Contact1
, [2] Contact2
, [3] Contact3
, [4] Contact4
, Concat([1], IsNull(';' + [2], ''), IsNull(';' + [3], ''), IsNull(';' + [4], '')) Contacts
FROM (
	-- concatenating names and adding a rank
	SELECT BusinessEntityID
	, Concat(FirstName, ' ', LastName) Contact
	, ROW_NUMBER() OVER(PARTITION BY BusinessEntityID ORDER BY FirstName) Ranking
	FROM Sales.vStoreWithContacts
) PER
PIVOT (
    Max(Contact)
	FOR Ranking IN ([1], [2], [3], [4])
) AS DAT

It's needed to rewrite the Concat function to port the code on SQL Server 2005 though. 

Talking about workarounds for splitting strings, in certain scenarios I used a combination of CutLeft & CutRight functions, which proved to be useful in data migrations, or use my own version of STRING_SPLIT (see SplitListWithIndex or SplitList). For concatenations I used mainly CTEs (see example) or cursors for exceptional cases (see example).

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.