Showing posts with label string functions. Show all posts
Showing posts with label string functions. Show all posts

05 November 2022

SQL Reloaded: STRING_AGG and STRING_SPLIT at Work, and a Bit of Pivoting

Working with strings across records was for long a nightmare for SQL developers until Microsoft introduced STRING_SPLIT in SQL Server 2016, respectively STRING_AGG in SQL Server 2017.  Previously, one was forced to write procedural language or use workarounds until SQL Server 2015, when recursive CTEs (common table expressions), Ranking and PIVOT were introduced, which allowed handling many scenarios. 

Microsoft provides several examples for the usage of STRING_SPLIT and STRING_AGG functions based on AdventureWorks database, though let's look at another example based on the same database. 

Let's say we want to show the concatenated Contacts for a store, result which can now easily be obtained by using the STRING_AGG:

-- concatenating names per store via STRING_AGG (SQL Server 2017+)
SELECT BusinessEntityID
, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
FROM Sales.vStoreWithContacts
GROUP BY BusinessEntityID
HAVING count(*)>1

Observe that is needed to use a GROUP BY to show one record per Store. Unfortunately, there isn't yet a window function available for the same. 

The inverse operation can be performed with the help of STRING_SPLIT table-valued function (TVF). (If you wonder why is needed a TVF, it is because the initial record needs to be multiplied by the generated output.)

-- reversing the concatenation (SQL Server 2017+)
WITH CTE
AS (
	-- concatenating names per store
	SELECT BusinessEntityID
	, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
	FROM Sales.vStoreWithContacts
	GROUP BY BusinessEntityID
	HAVING count(*)>1
) 
SELECT CTE.BusinessEntityID
, DAT.Value
, DAT.Ordinal 
FROM CTE
    CROSS APPLY STRING_SPLIT(Contacts, ';', 1) DAT

STRING_SPLIT provides also an ordinal field, which can be used in theory in pivoting the values, though we'd return then from where we started. Instead of using the query just generated, let's exemplify an alternative solution which is available with SQL Server 2005 for concatenating strings across records:
 
-- concatenating names per store via PIVOT (SQL Server 2012+)
SELECT BusinessEntityID
, [1] Contact1
, [2] Contact2
, [3] Contact3
, [4] Contact4
, Concat([1], IsNull(';' + [2], ''), IsNull(';' + [3], ''), IsNull(';' + [4], '')) Contacts
FROM (
	-- concatenating names and adding a rank
	SELECT BusinessEntityID
	, Concat(FirstName, ' ', LastName) Contact
	, ROW_NUMBER() OVER(PARTITION BY BusinessEntityID ORDER BY FirstName) Ranking
	FROM Sales.vStoreWithContacts
) PER
PIVOT (
    Max(Contact)
	FOR Ranking IN ([1], [2], [3], [4])
) AS DAT

It's needed to rewrite the Concat function to port the code on SQL Server 2005 though. 

Talking about workarounds for splitting strings, in certain scenarios I used a combination of CutLeft & CutRight functions, which proved to be useful in data migrations, or use my own version of STRING_SPLIT (see SplitListWithIndex or SplitList). For concatenations I used mainly CTEs (see example) or cursors for exceptional cases (see example).

07 December 2010

SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

    The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

   A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

05 December 2010

SQL Reloaded: A Look into Reverse

Some time back I was mentioning Stuff as one of the functions rarely used in SQL Server. Here’s another one: Reverse. I remember I used such a function in other programming languages and maybe 1-2 times in SQL Server, and I can’t say I seen it in action in other posts. As it name states and the MSDN documentation confirms, Reverse returns the reverse of a string value. The function takes as parameter any string value or value that could be implicitly conversed to a string, otherwise you’ll have to do an explicit conversion. As can be seen from the below first example the reverse of the “string” string is “gnirts”, and, as per the second example, the reverse of the revers of a string is the initial string itself. The third and fourth example use in exchange a numeric value.
 
-- reversing a string using Reverse function 
SELECT Reverse('string') Example1 
, Reverse(Reverse('string')) Example2 
, Reverse(1234567890) Example3 
, Reverse(Reverse(1234567890)) Example4 
Example1 Example2 Example3 Example4
gnirts string 987654321 1234567890

It seems there is not much to be said about Reverse function, and in the few situations that you need it, it’s great to have it, otherwise you would have to built it yourself. Let’s try to imagine there is no Reverse function in SQL Server, how would we provide such functionality by using existing functionality? In general such an exercise might look like lost time, though it could be useful in order to present several basic techniques. In this case the Reverse functionality could be provided by using a simple loop that iterates through string’s characters changing their order. The below piece of code should return the same output as above:

-- reversing a string using looping 
DECLARE @String varchar(50) 
DECLARE @Retval varchar(50) 
DECLARE @Index int 
SET @String = 'string' 
SET @Index = 1 
SET @Retval = '' 
WHILE @Index <= Len(@String) 
BEGIN  
   SET @Retval = Substring(@String, @Index, 1) + @Retval   
   SET @Index = @Index + 1  
END 
SELECT @Retval 

Some type of problems whose solution involve the sequential processing of values within loops could be solved also using recursive techniques (recursion) that involve the sequential processing of values in which the output of a given intermediary step is based on the previous step. Recursion imply a slightly different view of the same problem, it could prove itself to be maybe less efficient in some cases and quite a useful technique in the others. SQL Server provides two types of recursion, one at function level involving functions, stored procedures are triggers, respectively at query level implemented in common table expressions. Recursive functions, functions that call themselves, have been for quite a while in SQL Server though few of the database books I read really talk about them. Probably I will detail the topic in another post, though for the moment I will show only the technique at work. In this case the recursive approach is quite simple:

-- reversing a string using a recursive function 
CREATE FUNCTION dbo.ReverseString( 
@string varchar(50)) 
RETURNS varchar(50) 
AS 
BEGIN           
     RETURN (CASE WHEN Len(@string)>1 THEN dbo.ReverseString( Right(@string, len(@string)-1)) + Left(@string, 1) ELSE @string END) 
END 
 
-- testing 
SELECT dbo.ReverseString('string') Example1 
, dbo.ReverseString(dbo.ReverseString('string')) Example2 

Common table expressions (CTE), since their introduction with SQL Server 2005, became quite an important technique in processing hierarchical structures like BOMs, which involve the traversal of the whole tree. A sequence is actually a tree having the width of 1 node, so it seems like a CTE’s job:
  
-- reversing a string using common table expression 
DECLARE @String varchar(50) 
SET @String = 'string' 
;WITH CTE(String, [Index]) 
AS (     SELECT Cast('' as varchar(50)) String 
    , 1 [Index] 
    UNION ALL 
    SELECT Cast(Substring(@String, [Index], 1) + String as varchar(50)) String 
    , [Index] + 1 [Index] 
    FROM CTE 
    WHERE [Index]<=Len(@String) 
) 
SELECT @String Reversed 
FROM CTE 
WHERE [Index]-1=Len(@String) 

Unlike the previous two solutions, the CTE approach involves more work, work that maybe isn’t required just for the sake of using the technique. When multiple alternative approaches exist, I prefer using the (simplest) solution that offers the highest degree of flexibility. In this case is the loop, though in others it could be a recursive function or a CTE.

14 September 2007

SQL Reloaded: The Stuff function

No matter how much experience we have in a programming language or a technology, there is always something new to learn. There are a few hardly used functions in SQL Server 2000, but they could be really useful in certain situations. One of such functions is Stuff, I discovered its use long time after I started to play with SQL. 

  Stuff ( character_expression , start , length , character_expression )

--inserting a string inside another without doing a replacement
SELECT Stuff('This is just a test', 3, 0, 'x')  -- Output: Thxis is just a test 

--inserting a string inside another without doing a replacement 
SELECT Stuff('This is just a test', 3, 5, 'at was') --Output: That was just a test  

 So, it could be useful when we check whether a character is on a certain position, and replace it with another character. Normally we would have to write something like: 

DECLARE @string varchar(50) 
SET @string = 'DE1988299X8829' 
SELECT 
    CASE WHEN Substring(@string, 10,1) = 'X' THEN Stuff(@string, 10, 1, 'Y') 
   ELSE @string 
END 
Output: DE1988299Y8829     

Another function I haven't saw too often in SQL logic is Replicate, yeah, it does exactly what it's name suggests - it takes a string and replicates it's content multiple times. 

 Replicate ( character_expression , integer_expression ) 

 SELECT Replicate ('0', 10) --Output: 0000000000 
 SELECT Replicate ('tone', 3) --Output: tonetonetone     

The function could be useful when we need to put a number of characters in front of a value. For example a table contains integer values, but in a report we need them with leading zeroes (e.g. 00056 instead of 56). I tried to implement such functionality as a function, in a previous posting with the help of Space function; using Space and Replace functions can be obtained the same result as using Replicate

  SELECT Replace(Space(3), ' ', 'tone') --Output: tonetonetone

SQL Reloaded: Useful Functions in Data Conversion

    When working with data, even if it's about reporting, data conversion or data analysis, there are a few useful functions which makes life easier. They deal with three types of basic variables and their variations: strings (e.g. char, varchar), numeric (e.g. int, decimal) and date (e.g. smalldatetime, datetime).  

1. Case Function

CASE input_expression 
        WHEN when_expression 
        THEN result_expression [ ...n ] 
       [ELSE else_result_expression] 
END   
        
Judging from experience, CASE is the most used function in data conversions, when the output value is based on on one or multiple constraints over the already existing values. 
  
CASE WHEN column1 = 'yes' OR column2 = 'other' THEN 'yes/other' END    

 Just be careful, when the ELSE is not provided, a null will be returned! 

Another important thing, the values returned must have the same base type, dates, strings or numeric values can't be mixed without conversion! 

--example 1 
CASE 
    WHEN DateDiff(d, DateValue, GetDate())>0 THEN 'past' WHEN DateDiff(d, DateValue, GetDate())=0 THEN 'today' 
    ELSE 'future' 
END 

--example 2 
CASE 
     WHEN column1 BETWEEN 1 AND 100 THEN '1st range' 
     WHEN column1 BETWEEN 101 AND 1000 THEN '2nd range' 
     ELSE '3rd range' 
END 

--example 3 
CASE 
    WHEN column1 IN ('A', 'B', 'C') THEN '1st set' 
    WHEN column1 IN ('C', 'D', 'E') THEN '2nd set' 
    ELSE '3rd set' 
END      

 There is a saying for DBAs and not only, "Know your data!". As each statement is checked until comparison operation evaluated True, it is better to arrange the WHEN expression in the descending order of highest occurrence. 

 2. IsNull Function

IsNull(expression)   
  
 It's always a good idea to check for NULL values, they could occur anytime when a default value is not provided for a value. One of the cases I often found was when joining two tables when one of the values from JOIN was NULL. 

-- example for string values 
SELECT * 
FROM table1 A 
         LEFT JOIN table2 B 
             ON A.column1 = B.column1 
           AND IsNull(A.column2, ' ') = IsNull(B.column2, ' ') 
WHERE B.column1 IS NULL  

-- example for numeric values 
SELECT * 
FROM table1 A 
      LEFT JOIN table2 B 
          ON A.column1 = B.column1  
        AND IsNull(A.column2, 0) = IsNull(B.column2, 0) 
WHERE B.column1 IS NULL 

The ' ' and 0 values should be used when they aren't found in the list of possible values or there is no impact on the result. Another used value for IsNull function is 'N/A', to point out the missing value. 

 3. Coalesce Function

COALESCE ( expression [ ,...n ] )  
       
There are cases when is needed to take the first not null value from a list of columns, could be dates or any type of other value. Instead of writting something like IsNull(Date1, IsNull(Date2, IsNull(Date3, GetDate()) it's much easier to write COALESCE(Date1, Date2, Date3, GetDate()) 

-- example coalesce 
SELECT * 
FROM table1 
WHERE COALESCE(Column1, Column2, Column3) IS NOT NULL 

-- example IsNull alternative 
SELECT * 
FROM table1 
WHERE Column1 IS NOT NULL 
        OR Column2 IS NOT NULL 
        OR Column3 IS NOT NULL 

 4. NullIf Function
  
NullIf ( expression , expression )    

NullIf function is handy especially when is not needed a certain value in output, for example 0 or the .  

-- example numeric values 
SELECT column1
, NullIf(column2, 0) column2
FROM table_name     

-- example string or date values 
SELECT column1
, NullIf(column2, '') column2
FROM table_name     

When multiple values are not needed, it's easier maybe to write 

-- example multiple checks 
 SELECT column1 
, CASE 
     WHEN column2 IN ('A', 'B', 'V') THEN NULL 
     ELSE column1 END 
FROM table_name 

-- alternative 
SELECT column1, NullIf(NullIf(NullIf(column2, 'A'), 'B'), 'V') 
FROM table_name

 5. Convert Function

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )       

 I used to use CAST instead CONVERT, however CONVERT is more powerful, especially when it comes to dates. SQL Server comes with a few dates formats obtain by providing a value for style parameter: 

SELECT CONVERT(varchar(10), GetDate(),101) -- US format: mm/dd/yyyy
SELECT CONVERT(varchar(10), GetDate(),102) -- ANSI format: yyyy.mm.dd
SELECT CONVERT(varchar(10), GetDate(),103) -- British/French dd/mm/yyyy
SELECT CONVERT(varchar(10), GetDate(),104) -- German format: dd.mm.yyyy
SELECT CONVERT(varchar(12), GetDate(),105) -- Italian format: dd-mm-yyyy
SELECT CONVERT(varchar(12), GetDate(),106) -- dd mon yyyy
SELECT Upper(CONVERT(varchar(12), GetDate(),106))   -- upper case of: dd mon yyyy
SELECT CONVERT(varchar(8), GetDate(),112) -- ISO format (preferred): yyyymmdd
SELECT CONVERT(varchar(10), GetDate(), 23) -- yyyy-mm-dd
SELECT CONVERT(varchar(23), GetDate(),126) -- yyyy-mm-ddThh:mi:ss.mmm

The conversion of a date to a string is mainly required by the need to have the date in a special format or not allow Excel to format the date to its default. 

 6. IsNumeric Function

IsNumeric ( expression )     

This function shouldn't miss when casting string values to numeric values and doing operations with them. 

SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE IsNumeric(column1)=1     

Please note that when multiple constraints are used, the IsNumeric function should appear first after the WHERE clause, a query like the next one might raise an error if one of the column1 values is not numeric. 

-- example 
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE column2 = 'value' 
AND IsNumeric(column1)=1

-- alternative writing (recommended)
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name WHERE IsNumeric(column1)=1 
   AND column2 = 'value' 

 7. IsDate Function

IsDate( expression )          

 It works the same way as IsNumeric, but on date data types. Thus if the provided values is a date, it will return 1, otherwise 0. 

  SELECT IsDate(GetDate()) 
  SELECT IsDate('11/31/2007') --november has only 30 days 
  SELECT CASE IsDate('11/31/2007') WHEN 1 THEN 'Yes' ELSE 'No' END 

 Note:     
For more information on the function is always a good idea to consult Microsoft Technical documentation.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.