14 December 2010

Pulling the Strings of SQL Server – Part III: Concatenation

    Typically a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation.

    Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.

-- concatenation of strings
SELECT 'a string' + ' and ' + 'a second string' Example1
, 'a string' + ',' + 'a second string' Example2

, '1245' + '67890' Example3

, '1245' + '.' + '67890' Example4


   The concatenation of string variables or columns functions based on the same principles:
-- concatenating string variables
DECLARE @string1 varchar(50)
DECLARE
@string2 varchar(50)

DECLARE
@string3 varchar(50)


SET @string1 = 'this is a string'
SET
@string2 = 'this is another string'


SELECT @string1 + ' and ' + @string2 Example1
,
@string1 + char(31) + @string2 Example2

,
@string1 + ', ' + @string2 Example3

,
@string1 + ' ' + @string3 Example4

,
@string1 + IsNull(@string3, '!') Example5

concatenation 2 

    Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:
-- concatenating columns of joined tables
SELECT PAD.AddressID
,
IsNull(PAD.AddressLine1, '')

+
IsNull(', ' + PAD.AddressLine2, '')
+
IsNull(', ' + PAD.City, '')
+
IsNull(', ' + PAD.PostalCode, '')
+
IsNull(', ' + PSP.Name, '') Address
FROM
Person.Address PAD
      JOIN Person.StateProvince PSP
         ON PAD.StateProvinceID = PSP.StateProvinceID

concatenation 3
    As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

   There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

    Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:
-- concatenation within a loop
DECLARE @list varchar(max)
DECLARE
@index int

SET
@list = ''

SET
@index = ASCII('a')
WHILE (@index<ASCII('z'))
BEGIN

     SET @list = @list + ', ' + Char(@index)

     SET @index = @index + 1

END
SELECT
@list Result


    There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

    CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:
-- concatenation across records
WITH Data(Column1, Ranking)
AS 
(
-- preparing test data

       SELECT 'A' Column1, 0 Ranking
       UNION ALL 
       SELECT 'B' Column1, 1 Ranking
       UNION ALL
       SELECT 'C' Column1, 2 Ranking
)
, Result(Column1, Ranking)

AS
(
  -- performing the actual concatenation
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking FROM Data WHERE Ranking = 0
      UNION ALL
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking
      FROM Data A
         JOIN Result  B
             ON A.Ranking - 1 = B.Ranking
)
SELECT Column1
FROM Result

WHERE Ranking IN (SELECT MAX(Ranking) FROM Result


    The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

No comments: