05 February 2011

💎🏭SQL Reloaded: Deleting Sequential Data From a Table

Last week I run into an interesting solution to a simple problem, problem I dealt with in several occasions too: how to delete (efficiently) sequential data, or how LuborK calls it in his post, “a problem of efficient ordered delete”.  In this case he’s talking about the deletion of the first n records from a dataset of sequential data. It sounds like a simple problem considering that we can easily select in a first step the first n records using the TOP function and then delete the respective matched records. That’s also what LuborK does, however the novelty of the solution proposed it’s a little unexpected, and this not from the point of view of the approach, but of the gain in performance. The bottom line: by encapsulating the inner query within a view, the deletion of sequential data becomes more efficient.

While reading the respective post I thought: what if the inner query is encapsulated in a inline table-valued function?! Would the performance remain the same or deprecate? Does it make sense to add the overhead of creating a view for each such scenario in order to gain some performance? Then, reading the comments, one of them was pointing something so obvious: the inner query can be encapsulated within a common table expression, following to perform on it the deletion. An elegant solution I haven’t had available under SQL Server 2000 at the times I had to solve the same problem. Actually my approach was a little different, at that time identifying and storing the greatest value of a new data set, building thus a collection of values that would allow me to select a given range. This approach was quite useful in the respective scenarios and quite acceptable as performance, plus the fact that I was reusing the greatest value in further logic – two birds with one shot.

This being said, I was thinking that except the methods provided by LuborK, I could provide in this post several other simple techniques. Please note that I’m not focusing on performance, but on the techniques. Eventually you can display the statistics related to the disk activity and time required to parse, compile and execute each batch of statements. So, let’s start with the creation and population of the table! In order to minimize the differences, I took the same example and added just a little make-up – comments and a small change in the way the values of the second column are inserted.  

-- dropping the table 
-- DROP TABLE dbo.TestTable 

--declaring the table 
CREATE TABLE dbo.TestTable ( 
  id int primary key 
, value varchar (50))  

-- inserting the records in a loop 
DECLARE @index int  
SET @index=1  
SET NOCOUNT ON  
WHILE (@index<100000)  
BEGIN 
    INSERT INTO dbo.TestTable  
    VALUES (@index,'test '+cast(@index as varchar(10))) 
    SET @index=@index+1  
END  

Supposing we are interested in the first 10000 of records of a sequential data set, it would be much easier if we would know the maximum value from the respective data set. Then we could use the value to perform a simple deletion:
 
DECLARE @max as int 

-- deletion based on stored max value DECLARE @max int 
SELECT @max = MAX(ID) 
FROM( 
    SELECT top(10000) id 
    FROM TestTable  
    ORDER BY ID 
) A 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 

If we don’t need the maximum value for further logic, the two statements in can be combined in one query:

-- deletion based on max value - within one query DELETE FROM TestTable 
WHERE id <= ( 
     SELECT MAX(ID) 
     FROM ( 
          SELECT top(10000) id 
          FROM dbo.TestTable  
          ORDER BY ID 
     ) A 
) 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 
       
When dealing with huge volumes of data, and not only then, in case of sequential processing we can store the maximum value in a given table, and pull the values as required. This approach allows us to process any interval within the data set, in this case the delete resumes to a simple delete statement in which @start and @end were retrieved from the respective table:
 
-- deleting the data within a given interval 
DELETE FROM dbo.TestTable 
WHERE id BETWEEN @start AND @end 

If this seems trivial, let’s spice up things! As I observed, few people know that it’s possible to delete data from a given table using an inline table-valued function. Huh, what’s that?! If it seems impossible to you, take a look at documentation! I’m referring here to the GRANT object permissions section in which of interest is the following statement that refers to the ANSI 92 permissions applicable to the various database objects, for table-valued function being listed the following permissions: DELETE, INSERT, REFERENCES, SELECT, UPDATE. In order to exemplify the idea, let’s create the UDF:
 
-- creating the UDF function 
CREATE FUNCTION dbo.TestUDF1( 
@start int 
, @end int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT id, value  
    FROM dbo.TestTable  
    WHERE id BETWEEN @start AND @end 
)  

--deleting the data 
 DELETE FROM dbo.TestUDF1(20001, 30000) 

The example is supposed to work, unless you are having a problem with the permissions. Let’s consider this functionality from the perspective of our problem! What LuborK does is to include the inner query in a view and then delete the data:
 
-- creating the view 
CREATE VIEW dbo.vTestView 
AS 
SELECT top(10000) id 
FROM dbo.TestTable  
ORDER BY ID  

--deleting the data 
DELETE FROM dbo.vTestView 

One of the problems with this approach is that the value representing the number of records to be returned is hardcoded. And here comes to help an inline table-valued function, also referred as a “parameterized view”: 

-- creating the UDF function 
CREATE FUNCTION dbo.TestUDF2( 
@number_records int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT top (@number_records) id, value  
    FROM dbo.TestTable  
    ORDER BY id 
)        

--deleting the data 
DELETE FROM dbo.TestUDF2(10001)  

 And, as closure, the example based on the common table expression:

 -- deletion using a CTE 
;WITH CTE  
AS ( 
     SELECT top(10000) id 
     FROM dbo.TestTable  
     ORDER BY ID 
)  
DELETE FROM CTE 

You can review the record count after each operation:

-- review record count
SELECT count(*) 
FROM dbo.TestTable;

Notes:
The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code. 

Happy coding!

30 January 2011

💠🛠️ SQL Server: Learning Resources

During the past weeks I found several interesting learning resources on SQL Server 2008 topics, so here they are:

For those interest to upgrade their knowledge to SQL Server 2008 R2 Edition check the Microsoft SQL Server 2008 R2 Update for Developers Training Course videos on Channel9. They cover topics like Location Awareness, CLR Integration, StreamInsight, Reporting Services, Application and Multi-Server Management, improvements on BLOB, T-SQL and tools for Excel 2010 or SharePoint 2010.

The ER diagrams of SQL Server’s DMVs or DMFs are available for download as PDF at SQL Server 2008 Systems Views Map, and the same for SQL Server 2005, respectively SQL Server 2008 R2. Even if not all objects and possible relations are shown, they could still save you from lot of effort as the diagram depicts the relations between the most important entities and the cardinality existing between them, plus the additional constraints participating in the joins. The entities represented: Objects, Types and Indexes, Linked Servers, CLR, Database Mirroring, Service Broker, Resource Governor, Transactions, Databases and Storage, Traces and Events, Execution Environment, Server information.

There are several free SQL Server books on SQLServerCentral.com coming from Redgate: A. Kuznetsov’s Defensive Database Programming, B. McGehee’s SQL Server Maintenance Plans, How to become an Exceptional DBA, Mastering SQL Server Profiler and Brad’s Sure Guide to SQL Server 2008, J. Magnabosco’s Protecting SQL Server Data, G. Fritchey’s SQL Server Execution Plans and SQL Server Crib Sheet Compendium (coauthor with A. Prasanna), R. Landrum’s SQL Server Tacklebox, plus several volumes of ‘Best of SQL Server Central’ authors in Vol. 1-4, Vol. 5, Vol. 6 and Vol. 7, and several other books to come.

It seems there are a few new books also on Scribd website: R. Colledge’s SQL Server Administration in Action, R. Vieira’s Professional Microsoft SQL Server 2008 Programming, M. Lee and M. Mansfield’s SQL Server 2008 Administration Instant Reference, plus many other books that can be retrieved by searching SQL Server 2008 on the respective site.


There are several papers coming from Microsoft and several professionals. It worth to check them:
-   K. Lassen’s paper on Using Management Data Warehouse for Performance Monitoring, in which he provides several best practices for performance management of SQL Server 2008. Except the introduction in Management Data Warehouse features, he provides also an extension of the built-in functionality for indexes, plus several functions and queries.
-  D. Kiely's paper on SQL Server 2008 R2 Security Overview for Database Administrators covers some of the most important security features in SQL Server 2008. There is a similar paper targeting SQL Server 2005.
-  B. Beauchemin’s paper on SQL Server 2005 Security Best Practices - Operational and Administrative Tasks describes the best practices for setting up and maintaining security in SQL Server 2005.
-  SQL Server 2008 Compliance Guide a paper written to help organizations and individuals understand how to use the features of the Microsoft® SQL Server® 2008 database software to address their compliance need.
-  SQL Server 2008 Upgrade Technical Reference Guide essential phases, steps and best practices to upgrade existing instances of SQL Server 2000 and 2005 to SQL Server 2008.
-  Technet Wiki Checklists on Database Engine Security, Database Engine Security, Database Engine Connections and  Data Access. (probably more to come)
-   Also Microsoft maintains a list of White Papers on SQL Server 2008, respectively SQL Server 2005.

07 January 2011

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IV (Spaces, Trimming, Length and Comparisons)

In the previous post on concatenation, I was talking about the importance of spaces and other delimiters in making concatenations’ output more “readable”. Excepting their importance in natural language, the spaces have some further implication in the way strings are stored and processed. As remarked in the introductory post from this topic, there are two types of spaces that stand out in the crowds of spaces, namely the trailing spaces, the spaces found at the right extremity of a string,  respectively the leading spaces, the spaces found at the left extremity of a string. 

Are few the cases when the two trailing space are of any use, therefore databases like SQL Server usually ignore them. The philosophy about leading space is slightly different because there are cases in which they are used in order to align the text to the right, however there are tools which are cutting off the leading spaces. When no such tools are available or any of the two types of spaces are not cut off, then we’ll have do to it ourselves, and here we come to the first topic of this post, trimming.

Trimming

Trimming is the operation of removing the empty spaces found at the endings of a string. Unlike other programming languages which use only one function for this purpose (e.g. Trim function in VB or Oracle), SQL Server makes use of two functions used for this purpose, LTrim used to trim the spaces found at the left ending of the string, respectively RTrim, used to trim the spaces found at the right ending of the string.

-- trimming a string 
SELECT  LTrim(' this is a string ') Length1 -- left trimming 
, RTrim(' this is a string ') Length2 --right trimming 
, LTrim(RTrim(' this is a string ')) Length2 --left & right trimming 

As can be seen it’s not so easy to identify the differences, maybe the next function will help to see that there is actually a difference.

Note:
1) If it looks like the two trimming functions are not working with strings having leading or trailing spaces, then maybe you are not dealing with an empty character but rather with other characters like CR, LF, CRLF or other similar characters, rendered sometimes like an empty character.
2)   In SQL Server 2017 was introduced the Trim function which not only replaces the combined use of LTrim and RTrim functions, but it allows to replace other specified characters (including CR, LF, Tab) from the start or end of a string. (see post

Length

Before approaching other operations with strings, it’s maybe useful (actually necessary as we will see) to get a glimpse of the way we can determine the length of a string value, in other words how many characters it has, this being possible by using the Len function:

-- length of a string 
SELECT Len('this is a string') Length1 -- simple string 
, Len('this is a string ') Length2 --ending in space 
, Len(' this is a string') Length3 --starting with a space 
, Len(' this is a string ') Length4 --starting &amp; ending with a space 
, Len(LTrim(' this is a string ')) Length5 --length & left trimming 
,Len(RTrim(' this is a string ')) Length5 --length & right trimming
,Len(LTrim(RTrim(' this is a string '))) Length5 --length, left & right trimming    

In order to understand the above results, one observation is necessary: if a strings ends in with one or more empty characters, the Len function ignores them, though this doesn’t happen with the leading empty characters, they needing to be removed explicitly if needed.

Comparisons

The comparison operation points the differences or similarities existing between two data types, involving at minimum two expressions that reduce at runtime to a data type and the comparison operator. This means that each member of comparison could include any valid combinations of functions as long they are reduced to compatible data types. In what concerns the comparison of strings, things are relatively simple, the comparison being allowed  independently on whether they have fix or varying length. Relatively simple because if we’d have to go into details, then we’d need to talk about character sets (also called character encoding or character maps) and other string goodies the ANSI SQL standard(s) are coming with, including a set of rules that dictate the behavior of comparisons. So, let’s keep things as simple as possible. As per above attempt of definition, a comparison implies typically an equality, respectively difference, based on equal (“=”), respectively not equal (“<>” or “!=”). Here are some simple examples:

-- sample comparisons 
SELECT CASE WHEN 'abc' != 'abc ' THEN 1 ELSE 0 END Example1 
, CASE WHEN ' abc' != 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN ' ' != '' THEN 1 ELSE 0 END Example3 
-- error comparison , CASE WHEN 'abc' != NULL THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' = NULL THEN 1 ELSE 0 END Example5 
-- adequate NULL comparison , CASE WHEN 'abc' IS NOT NULL THEN 1 ELSE 0 END Example6  
, CASE WHEN 'abc' IS NULL THEN 1 ELSE 0 END Example7 
Output:
Example1 Example2 Example3 Example5 Example7
0 1 0 0 0

The first three examples are demonstrating again the behavior of leading, respectively trailing spaces. The next two examples, even if they seem quite logical in terms of natural language semantics, they are wrong from the point of view of SQL semantics, and this because the comparison of values in which one of them is NULL equates to a NULL, thus resulting the above behavior in which both expressions from the 4th and 5th example equate to false. The next two examples show how the NULLs should be handled in comparisons with the help of IS operator, respectively it’s negation – IS NOT. 

 Like in the case of numeric values, the comparison between two strings could be expressed by using the “less than” (“<;”) and “greater than” (“?”) operators, alone or in combination with the equality operator (“<=”, “>=”) or the negation operator (“!>”, “<!”) (see comparison operators in MDSN). Typically an SQL Server database is case insensitive, so there  will be no difference between the following strings: “ABC”, “abc”, “Abc”, etc. Here are some examples:

-- sample comparisons (case sensitive) 
SELECT CASE WHEN 'abc' < 'ABC' THEN 1 ELSE 0 END Example1 
, CASE WHEN 'abc' > 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN 'abc' >= 'abc ' THEN 1 ELSE 0 END Example3 
, CASE WHEN 'abc' <> 'ABC' THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' > '' THEN 1 ELSE 0 END Example5 
, CASE WHEN ' ' > '' THEN 1 ELSE 0 END Example6 
Output:
Example1 Example2 Example3 Example4 Example5 Example6
0 0 1 0 1 0


The case sensitivity could be changed at attribute, table or database level. As we don’t deal with a table and the don’t want to complicate too much the queries, let’s consider changing the sensitivity at database level. So if you are using a non-production database, try the following script in order to enable, respectively to disable the case sensitivity:

--enabling case sensitivity for a database 
ALTER DATABASE <database name>  
COLLATE Latin1_General_CS_AS  

--disabling case sensitivity for a database 
ALTER DATABASE <database name> 
COLLATE Latin1_General_CI_AS 
 
In order to test the behavior of case sensitivity, enable first the sensitivity and then rerun the previous set of example (involving case sensitivity).
Output:
Example1 Example2 Example3 Example4 Example5 Example6
1 0 1 1 1 0
After that you could disable again the case sensitivity by running the last script. Please note that if your database has other collation, you’ll have to change the scripts accordingly in order to point to your database’s collation.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

03 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server V (Character Indexes)

A string is in fact a concatenation or chain of characters, each character in the string having a position referred as character’s index. In programming it’s useful to identify the first occurrence of a character or string in forward or backwards order, with a few simple tricks being possible to identify all the occurrences of the characters or whole strings in a string. Such functions are appearing in some of the well known programming languages under names like InStr, IndexOf for forward processing, respectively, InStrRev, LastIndexOf. SQL Server has only one function for this purpose, namely CharIndex, used to retrieve the position where a character is found starting from a given position. If no occurrence was found, the function returns 0, while if no start position if given, the default is 0. There are several other remarks that could be found in MSDN, and it’s actually always a good idea to consult the documentation from time to time, not only when meeting a new function. But let’s get back to the CharIndex function with a few examples based on searching a character within a given string:

-- searching a character into a string 
DECLARE @string varchar(50)  
SET @string = 'this is a string'  
SELECT CharIndex(' ', @string) Example1  
, CharIndex(NULL, @string) Example2 
, CharIndex(NULL, @string, NULL) Example3 
, CharIndex(NULL, NULL, NULL) Example4 
Output:
Example1 Example2 Example3 Example4
5 NULL NULL NULL

If in the first example the function is used to retrieve the first occurrence of a space in the given string, in the last three examples is highlighted the behavior of the function in case one of the parameters is NULL, in such cases the function returning a NULL – this means that when other behavior is expected, it falls in your attributions to handle the NULL values accordingly.

Here’s a second set of examples based on the search of a string within another string. As can be seen there isn’t a difference at all in how the function is applied:

-- searching a string into a string
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_is_another_string'  
SET @search = 'is'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- occurring position  
, CharIndex(@search, @string, 5) Example3 -- second occurring position 
Output:
Example1 Example2 Example3
this_is_another_string 3 6

From the three examples, the third deserves some special attention because it attempts to retrieve the second occurrence of the string by using a hardcoded starting value. In fact the search could start a position further from the first occurrence, something like in the below example:

-- searching a string into a string in foward/reverse order 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- 1st occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string)+1) Example3 -- 2nd occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string, CharIndex(@search, @string)+1)+1) Example4 -- 3rd 
occuring position (forward)
Output:
Example1 Example2 Example3 Example4
this_was_a_string 5 9 11

The logic could be repeated over and over again in order to retrieve the n-th position, list all the positions of occurrence or the total number of occurrences.

Even if SQL Server doesn’t provide a function for retrieving the first occurrence in backward order, in other words starting from the end to the start, there is actually a simple trick that might do it. As SQL Server provides a Reverse function, which reverses the value of the string provided as parameter.

-- 1st occurring position backwards 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT Len(@string) - CharIndex(@search, Reverse(@string))+1 Example 

Unfortunately the logic needed to retrieve the second and following occurrences starting from a given position it’s not so easy readable. Therefore, for the sake of “simplicity”, here’s the logic for reverse processing incorporated in CharIndexRev function:

-- reverse CharIndex 
CREATE FUNCTION dbo.CharIndexRev( 
@search varchar(50) 
, @string varchar(500) 
, @start int = 0) 
RETURNS int 
AS BEGIN 
RETURN Len(@string) - CharIndex(@search, Reverse(@string), CASE WHEN IsNull(@start, 0) != 0 THEN Len(@string) - @start+1 ELSE 0 END)+1 
END 

And here’s the function at work:
 
-- searching a string into a string in reverse order 
DECLARE @string varchar(50) 
DECLARE @search varchar(50) 
SET @string = 'this_was_a_string' 
SET @search = '_' 
SELECT dbo.CharIndexRev(NULL, @string, 0) Example1 
, dbo.CharIndexRev(@search, NULL, 0) Example2 
, dbo.CharIndexRev(NULL, NULL, 0) Example3 
, dbo.CharIndexRev(@search, @string, 0) Example4 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1) Example5 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1)-1) Example6 
Output:
Example1 Example2 Example3 Example4 Example5 Example6
NULL NULL NULL 11 9 5

As can be seen, the output of the last three examples matches the reversed output of the previous set (11, 9, 5) vs. (5, 9, 11), while the first three examples exemplify the behavior of the function when one of the parameters is Null.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

01 January 2011

🔖Book Review: Lynn Beighley's "Head First SQL" (2011)

I love the Head First O’Reilly book series which besides the book on SQL, has acquired over its short existence several other titles like the ones on Data Analysis, Statistics and Excel, topics strongly correlated with the thematic of the current blog, but also related tangential topics like Project Management, Design Patterns, Object-Oriented Analysis and Design, Software Development, Web Design, Programming, PHP & MySQL, HTML with CSS & XHTML, C#, Java, AJAX, JavaScript, Pyhton, Servlets and JSP, EJB or Rails. (The book series contains more titles which seems to be available also at Head First Labs together with other supporting information.) 

What I like about this book series is that approaches the themes from a basic level, using a rich set of "brain-friendly" visual representations, conversational and personalized style, challenging food for thought, plus some advices on how to use book’s format, facts that allow the reader to make most of the learning process. In fact the books are approachable for any newbies in IT branch, so I recommend the whole set of books to beginners as well to experienced programmers, considering the fact that everybody has something to learn, at least when considering the presentational format of the book.

Coming back to the current title, Head First SQL, it covers almost everything somebody has to know about database design, DML and DDL statements, database objects and security in a nutshell. It focuses only on the SQL Standard (I wish I know which one as there are several versions/revisions, see SQL 1999), so it could be used for all type of RDBMS that implements the standard, even if the book is built around MySQL. A consequence of this aspect is that any vendor specific flavors won’t be found in here, so the book should be used in combination with the documentation or the books that target the vendor-specific functionality. There are also several topics not covered but mentioned roughly: reserved words and special characters, temporary tables, data conversion, some numeric functions and operators, indexes, etc.

I must remark that some of the definitions are somehow ambiguous, and I feel that in some cases additional clarifications are necessary. Some examples in this direction could be considered the very first important definitions, the ones of the database seen as "a container that holds tables and SQL structures related to those tables", the one of column seen as piece of data stored by a table, or of a row seen as a single set of columns that describe attributes of a single thing. The first definition is quite acceptable if it’s ignored the existence of non-relational databases (it’s also true that the book targets RDBMS even if not evident), however the other two definitions are too vague, at least from my point of view. 

It’s also true that more rigor and detail would involve the introduction of other concepts (e.g. entity, matrix, etc.), requiring more space and maybe complicate the topics. The reader could intuit from explanatory text and examples what’s all about, however expecting the reader to bridge such gaps should be avoided especially when we consider the intrinsic nature of this book in particular, and IT books in general. Anyway, this shouldn’t stop you discover and enjoy the book!

Notes:
1. As I learned on the hard way, not all book formats and writing styles are matching everybody’s taste and expectations. The simplistic and rich visually format of the book could annoy as well, so might be a good idea to browse first the book on Scribd before buying it.
2. The source code is available here.

31 December 2010

Meta-Blogging: Past, Present and Future

Introduction

   Even if I started to blog 3-4 years ago, only this year (still 2010) I started to allocate more time for blogging, having two blogs on which I try to post something periodically: SQL Troubles and The Web of Knowledge plus a homonym Facebook supporting group (for the later blog). As a parenthesis, the two blogs are approaching related topics from different perspective, the first focusing on data related topics, while the second approaching data from knowledge and web perspective; because several posts qualify for both blogs, I was thinking to merge the two blogs, though given the different perspectives and types of domains that deal with them, at least for the moment I’ll keep them apart. Closing the parenthesis, I would like to point out that I would love to allocate more time though I have to balance between blogging, my professional and personal life, and even if the three have many points in common, some delimitation it’s necessary. Because it’s the end of a year, I was thinking that it’s maybe the best time to draw the line and analyze the achievements of the previous year and the expectations for the next year(s), for each of the two blogs. So here are my thoughts:

Past and Present

    There are already more than 10 years since I started to work with the various database systems, my work ranging from data modeling to database development, reporting, ERP systems, etc. I can’t consider myself an expert, though I’ve accumulated experience in a whole range of areas, fact that I think entitles me to say that I have something to write about, even if the respective themes are not rocket science. In addition, it’s the human endeavor of learning something new each day, and in IT that’s quite an imperative, the evolvement of various technologies requesting those who are working in this domain to spend extra hours in learning new things or of consolidating or reusing knowledge in new ways. I considered at that time, and I still do, that blogging helps the learning process, allowing me to externalize the old or new knowledge, clear my thoughts, have also some kind of testimony of what I know or at least a repository of information I could reuse when needed, and eventually receive some feedback. These are few of the reasons for which this blog was born, and I hope the information presented in here are useful also for other people.

  During the past year I made it to post on my blog more than 100 entries on various topics, the thematic revolving around strings, hierarchical queries, CLR functionality, Data Quality, SSIS, ERPs, Reports, troubleshooting, best practices, joins, etc. Not all the posts rose to my expectations, though that’s a start, hoping that I will find a personal style and the quality of the posts will increase. I can’t say I received lot of feedback, however based on the user access’ statistics provided by Clustrmaps and Google the number of visitors this year was somewhere around 8500, close to my expectations. Talking about the number of visitors, it’s nice to have also some visualization, so for this year’s statistics I’ll use Clustrmaps visualization, which provides a more detailed geographical overview than Google’s Stats, while for trending I show below Google’s Stats (contains data from May until today):

The Web of Knowledge - Clustrmaps 2010 statistics

The Web of Knowledge - Google 2010 statistics

     What I find great about Google’s Stats is that it provides also an overview of the most accessed posts and the traffic sources. There are also some statistics of the audience per browsers and OS, though they are less important for my blogging requirements, at least for the moment.

The Web of Knowledge - pageviews by OS The Web of Knowledge - pageviews by browsers

  What I find interesting is that most visited posts and searched keywords were targeting SSIS and Oracle vs. SQL Server-related topics. So, if for the future I want more traffic than maybe I should diversify my topics in this direction.

Future

  I realize that I started many topics, having in the next year to continue posting on the same, but also targeting new topics like Relational Theory, Business Intelligence, Data Mining, Data Management, Statistics, SQL Server internals, data technologies, etc. Many of the posts will be an extension of my research on the above topics, and I was thinking to post also my learning notes with the hope that I will receive more feedback. I realized that I need to be more active and provide more feedback to other blogs, using the respective comments as gateways to my blog and try to build a network around it. I was thinking also to start a Facebook “support group”, posting the links I discovered, quotes or impressions in a more condensed form, but again this will take me more time, so I’m not sure if it makes sense to do that. Maybe I should post them directly on the blog, however I wanted my posts to be a little more consistent than that. Anyway I know also that I won’t manage to post more than an average of one post per week though per current expectations is ok.

  Right now all the posts are following a push model, in other words I push the content independently of whether there is a demand or not for it. It’s actually natural because the blog is having a personal note. In the future I’m expecting to move in the direction of a pull model, in other words to write on topics requested by readers, however for this I need more feedback from you, the reader. So please let me know what topics you’d like to read!

  I close here, hoping that the coming year (2011) will be much better than the current one. I wish to all of you, a Happy New Year!

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

Typically, a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation. 

Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.  

-- concatenation of strings 
SELECT 'a string' + ' and ' + 'a second string' Example1  
, 'a string' + ',' + 'a second string' Example2  
, '1245' + '67890' Example3  
, '1245' + '.' + '67890' Example4 

The concatenation of string variables or columns functions based on the same principles: 

-- concatenating string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 varchar(50) 
DECLARE @string3 varchar(50) 

SET @string1 = 'this is a string'  
SET @string2 = 'this is another string'  

SELECT @string1 + ' and ' + @string2 Example1 
, @string1 + char(31) + @string2 Example2 
, @string1 + ', ' + @string2 Example3 
, @string1 + ' ' + @string3 Example4 
, @string1 + IsNull(@string3, '!') Example5 
 
concatenation 2 

Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:
 
-- concatenating columns of joined tables 
SELECT PAD.AddressID 
, IsNull(PAD.AddressLine1, '')  
+ IsNull(', ' + PAD.AddressLine2, '') 
+ IsNull(', ' + PAD.City, '') 
+ IsNull(', ' + PAD.PostalCode, '') 
+ IsNull(', ' + PSP.Name, '') Address 
FROM Person.Address PAD 
      JOIN Person.StateProvince PSP 
         ON PAD.StateProvinceID = PSP.StateProvinceID 
 
concatenation 3

As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:

-- concatenation within a loop 
DECLARE @list varchar(max) 
DECLARE @index int  
SET @list = '' 
SET @index = ASCII('a') WHILE (@index<ASCII('z')) 
BEGIN 
     SET @list = @list + ', ' + Char(@index) 
     SET @index = @index + 1 
END 
SELECT @list Result 

There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:
 
-- concatenation across records 
;WITH Data(Column1, Ranking)  
AS  
( -- preparing test data 
       SELECT 'A' Column1, 0 Ranking  
       UNION ALL  
       SELECT 'B' Column1, 1 Ranking  
       UNION ALL  
       SELECT 'C' Column1, 2 Ranking 
) 
, Result(Column1, Ranking)  
AS 
(  -- performing the actual concatenation 
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking 
      FROM Data 
      WHERE Ranking = 0  
      UNION ALL 
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking  
      FROM Data A  
         JOIN Result  B  
             ON A.Ranking - 1 = B.Ranking  
) SELECT Column1  
FROM Result  
WHERE Ranking IN (SELECT MAX(Ranking) FROM Result) 
  
The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.