11 March 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VIII (Insertions, Deletions and Replacements)

Until now, the operations with strings resumed to concatenation and its reverse operation(s) - extracting a substring or splitting a string into substrings. It was just the warm up! There are several other important operations that involve the internal manipulation of strings – insertion, deletion and replacement of a substring in a given string, operations performed using the Replace and Stuff functions.

Replace function, as its name denotes, replaces all occurrences of a specified string value with another. Several scenarios in which the function is quite useful: the replacement of delimiters, special characters, correcting misspelled words or any other chunks of text. Here are some basic simple examples, following to consider the before mentioned applications in other posts:

-- examples with replace 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT replace(@str, ' ', ',') Example1 
, replace(@str, ' ', ' ') Example2 
, replace(@str, ' ', '') Example3 
, replace(@str, 'is', 'as') Example4  
Output:
Example1 Example2 Example3 Example4
this,is,a,test,string this is a test string thisisateststring thas as a test string

When there are good chances that the searched string won’t appear in the “searched” string, and especially when additional logic is depending on the replacement, logic that could be included in the same expression with the replacement, then maybe it makes sense to check first if the searched character is present:

-- replacement with check 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
SET @str = 'this is a test string' 
SET @search = 'this string' 
SET @replacememt = 'other string' 
SELECT CASE            
    WHEN CharIndex(@search, @str)>0 THEN Replace(@str, @search, @replacememt)             
    ELSE @str        
END result 

Unfortunately the function doesn’t have the flexibility of the homonym functions provided by the languages from the family of VB (VBScript, VB.NET), which allow to do the replacement starting with a given position, and/or for a given number of occurrences. This type of behavior could be obtained with a simple trick – splitting the string into two other strings, performing the replacement on the second string, and then concatenating the first string and the result of the replacement:

-- replacement starting with a given position 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
DECLARE @start int 
SET @str = 'this is a test string' 
SET @search = 's' 
SET @replacememt = 'x' 
SET @start = 7 
SELECT Left(@str, @start-1) FirstPart 
, RIGHT(@str, Len(@str)-@start+1) SecondPart 
, CASE        
    WHEN @start <= LEN(@str) THEN Left(@str, @start-1) + Replace(RIGHT(@str, Len(@str)-@start+1), @search, @replacememt)        
     ELSE @str  
END Replacement 
Output:
FirstPart SecondPart Replacement
this i s a test string this ix a text xtring

The logic can be encapsulated in a function together with additional validation logic.

Stuff function inserts a string into another string starting with a given position and deleting a specified number of characters. Even if seldom used, the function it’s quite powerful allowing to insert a string in another, to remove a part of a string or more general, to replace a single occurrence of a string with another string, as can be seen from the below examples:
 
-- Stuff-based examples DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT STUFF(@str, 6, 2, 'was ') Example1 
, STUFF(@str, 1, 0, 'and ') Example2 
, STUFF(@str, 1, 0, 'that') Example3 
, STUFF(@str, LEN(@str) + 1, 0, '!') Example4 
Output:
Example1 Example2 Example3 Example4
this was a test string and this is a test string thatthis is a test string NULL

If in the first example is done a replacement of a text from a fix position, in the next examples are attempted insert on a first, middle respectively end position. As can be seen, the last example doesn’t work as expected, this because the insert position can’t go over the length of the target string. Actually, if the insert needs to be done at the beginning, respectively the end of a string, a concatenation can be much easier to use. A such example is the padding of strings with leading or trailing characters, typically in order to arrive to a given length. SQL Server doesn’t provide such a function, however the function is quite easy to build.
 
-- left/right padding DECLARE @str varchar(30) 
DECLARE @length int  
DECLARE @padchar varchar(1) 
SET @str = '12345'  
SET @length = 10 
SET @padchar = '0' 
SELECT @str StringToPad  
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 
 
Output:
StringToPad LeftPadding RightPadding
12345 0000012345 1234500000

Happy Coding!

24 February 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VI (Subparts of a String)

No matter how normalized a database is, there will always be the need to encode multiple semantic entities in a string, needing thus to extract them later. For example data like the Street Name and Number, or the Country Phone Prefix and Phone Number, the First and the Last Name, etc. Another frequent scenario is the misuse of a long-sized string data types to store multiple delimited attributes or simply a whole text like a Comment or Description. The easiest scenarios to deal with are the ones when you know the rules behind your encoding, preferably dealing with a fix length encoding or only a given delimiter. An example of fix length encoding is the IBAN Number, the country specific VAT Number or any other artificial constructed/standardized encoding. According to Wikipedia, IBAN (International Bank Account) Number consists of  ISO 3166-1 alpha-2 country code, followed by two check digits that are calculated using a mod-97 technique, and Basic Bank Account Number (BBAN) with up to thirty alphanumeric characters. Taking the IBAN example provided for Switzerland, it could be stored as “CH93 0076 2011 6238 5295 7” or “CH9300762011623852957”, in either case when the country is not provided explicitly it would be interesting to extract it from the IBAN together with the BBAN. How would we do that?

Many of the programming languages I worked with provide a function for extracting a substring from a string - Mid (VB/VBScript), Substring in C# Substr in Oracle, etc. SQL Server is actually providing three functions for this purpose: Substring, Left, respectively Right. The Substring function extracts a substring of a given length starting from a given position. The Left and Right functions return, as their name suggests, the left part, respectively the right part of a character string with the specified number of characters. The use of Left and Right functions seems to be redundant, as they are a particular case of Substring, however they can simplify sometimes the code, as can be seen from below example based on the above IBAN example.

-- parsing a VAT Number 
DECLARE @VAT varchar(50) 
SET @VAT = 'CH9300762011623852957' 
SELECT @VAT VATNumber 
, LEFT(@VAT, 2) CountryCode1 
, SUBSTRING(@VAT, 1, 2) CountryCode2 
, SUBSTRING(@VAT, 3, 2) CheckDigits 
, RIGHT(@VAT, Len(@VAT)-4) BBAN1 
, SUBSTRING(@VAT, 5, Len(@VAT)-4) BBAN2 
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 

Substrings Example 1

Even if the IBAN has a variable-length component (the BBAN) given the fact that the other two components are fixed, this allows us to clearly extract each component. The example shows also the equivalent call of Substring function for Left (Country Code extraction), respectively Right (BBAN extraction).

What happens if there are more than one variable-length components? For such scenarios it’s useful to introduce a delimiter, it could be a comma, a dash, space, etc. It’s the case of a First and Last Name stored in the same attribute. Normally only one component qualifies as Last Name, and for simplicity let’s consider it as being stored first and space as delimiter. In order to identify the components, it’s enough to identify the first occurrence of the delimiter by using the CharIndex function.
 
-- parsing a Person's Name 
DECLARE @Name varchar(50) 
SET @Name = 'Stone Sharon' 
SELECT @Name Name 
, LEFT(@Name, CHARINDEX(' ', @Name)) LastName 
, RIGHT(@Name, LEN(@Name) - CHARINDEX(' ', @Name)) FirstName 
Output:
Name LastName FirstName
Stone Sharon Stone Sharon

The code for cutting the left, respectively right part of a string is pretty simple and over the years I used it quite often, so it makes sense to encapsulate it in a function., like I did in an older post.

When a delimiter is used repeatedly in a string, normally we need to identify each component in the string. Sometimes the number of components can be given, other times not. For this purpose can be used a common table expression, and here is another example in which the space is used as delimiter – extracting the words from a given sentence.

-- extracting the words from a sentence 
DECLARE @str nvarchar(100) 
SET @str = 'This is just a test' 
;WITH CTE (PrevString, Position, Word) 
AS ( 
    SELECT LTrim(RTrim( CASE  
         WHEN CharIndex(' ', @str)>0 THEN Right(@str, Len(@str)-CharIndex(' ', @str)) 
         ELSE '' 
     END)) PrevString 
, 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', @str)>0 THEN LEFT(@str, CharIndex(' ', @str)) 
     ELSE @str 
END)) Word 
UNION ALL  
SELECT LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(' ', PrevString)) 
     ELSE '' 
END)) PrevString 
, Position + 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN LEFT(PrevString, CharIndex(' ', PrevString)) 
     ELSE PrevString 
END)) Word 
FROM CTE  
WHERE Len(PrevString)>0 
)  
SELECT PrevString 
, Word  
, Position 
FROM CTE 
OPTION (maxrecursion 100)
Output
PrevString Word Position
is just a test This 1
just a test is 2
a test just 3
test a 4
test 5

The logic works for a sentence, and if we ignore the fact that some punctuation signs are appearing at the end of the words, it might work  as well for a whole phrase, considering that the punctuation signs can be replaced from the end result. It would be useful for example to generalize the logic for a set of delimiters, in this case the other punctuation signs (e.g. “,”, “;”, “!”, etc.), however this would mean to identify which of the delimiters is used first or to apply the the same logic for the first delimiter, then for the second and so on. In addition, if the number of encoded elements within a value remain the same, a pivot can be applied on the final result and have thus all values’ elements within the same row.

Happy Coding!   

05 February 2011

💎SQL Reloaded: Deleting Sequential Data From a Table

    Last week I run into an interesting solution to a simple problem, problem I dealt with in several occasions too: how to delete (efficiently) sequential data, or how LuborK calls it in his post, “a problem of efficient ordered delete”.  In this case he’s talking about the deletion of the first n records from a dataset of sequential data. It sounds like a simple problem considering that we can easily select in a first step the first n records using the TOP function and then delete the respective matched records. That’s also what LuborK does, however the novelty of the solution proposed it’s a little unexpected, and this not from the point of view of the approach, but of the gain in performance. The bottom line: by encapsulating the inner query within a view, the deletion of sequential data becomes more efficient.

    While reading the respective post I thought: what if the inner query is encapsulated in a inline table-valued function?! Would the performance remain the same or deprecate? Does it make sense to add the overhead of creating a view for each such scenario in order to gain some performance? Then, reading the comments, one of them was pointing something so obvious: the inner query can be encapsulated within a common table expression, following to perform on it the deletion. An elegant solution I haven’t had available under SQL Server 2000 at the times I had to solve the same problem. Actually my approach was a little different, at that time identifying and storing the greatest value of a new data set, building thus a collection of values that would allow me to select a given range. This approach was quite useful in the respective scenarios and quite acceptable as performance, plus the fact that I was reusing the greatest value in further logic – two birds with one shot.

    This being said, I was thinking that except the methods provided by LuborK, I could provide in this post several other simple techniques. Please note that I’m not focusing on performance, but on the techniques. Eventually you can display the statistics related to the disk activity and time required to parse, compile and execute each batch of statements. So, let’s start with the creation and population of the table! In order to minimize the differences, I took the same example and added just a little make-up – comments and a small change in the way the values of the second column are inserted.  

-- dropping the table 
DROP TABLE dbo.TestTable 

--declaring the table 
CREATE TABLE dbo.TestTable ( 
  id int primary key 
, value varchar (50))  

-- inserting the records in a loop 
DECLARE @index int  
SET @index=1  
SET NOCOUNT ON  
WHILE (@index<100000)  
BEGIN 
    INSERT INTO dbo.TestTable  
    VALUES (@index,'test '+cast(@index as varchar(10))) 
    SET @index=@index+1  
END  

    Supposing we are interested in the first 10000 of records of a sequential data set, it would be much easier if we would know the maximum value from the respective data set. Then we could use the value to perform a simple deletion:
 
-- deletion based on stored max value DECLARE @max int 
SELECT @max = MAX(ID) 
FROM( 
    SELECT top(10000) id 
    FROM TestTable  
    ORDER BY ID 
) A 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 

    If we don’t need the maximum value for further logic, the two statements in can be combined in one query:

-- deletion based on max value - within one query DELETE FROM TestTable 
WHERE id <= ( 
     SELECT MAX(ID) 
     FROM ( 
          SELECT top(10000) id 
          FROM dbo.TestTable  
          ORDER BY ID 
     ) A 
) 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 
       
    When dealing with huge volumes of data, and not only then, in case of sequential processing we can store the maximum value in a given table, and pull the values as required. This approach allows us to process any interval within the data set, in this case the delete resumes to a simple delete statement in which @start and @end were retrieved from the respective table:
 
-- deleting the data within a given interval 
DELETE FROM dbo.TestTable 
WHERE id BETWEEN @start AND @end 

   If this seems trivial, let’s spice up things! As I observed, few people know that it’s possible to delete data from a given table using an inline table-valued function. Huh, what’s that?! If it seems impossible to you, take a look at documentation! I’m referring here to the GRANT object permissions section in which of interest is the following statement that refers to the ANSI 92 permissions applicable to the various database objects, for table-valued function being listed the following permissions: DELETE, INSERT, REFERENCES, SELECT, UPDATE. In order to exemplify the idea, let’s create the UDF:
 
-- creating the UDF function 
CREATE FUNCTION dbo.TestUDF1( 
@start int 
, @end int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT id, value  
    FROM dbo.TestTable  
    WHERE id BETWEEN @start AND @end 
)  

--deleting the data 
 DELETE FROM dbo.TestUDF1(20001, 30000) 

    Voila! The example is supposed to work, unless you are having a problem with the permissions. Let’s consider this functionality from the perspective of our problem! What LuborK does is to include the inner query in a view and then delete the data:
 
-- creating the view 

CREATE VIEW dbo.vTestView 
AS 
SELECT top(10000) id 
FROM dbo.TestTable  
ORDER BY ID  


--deleting the data 
DELETE FROM dbo.vTestView 

    One of the problems with this approach is that the value representing the number of records to be returned is hardcoded. And here comes to help an inline table-valued function, also referred as a “parameterized view”: 

 -- creating the UDF function 
 CREATE FUNCTION dbo.TestUDF2( 
@number_records int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT top (@number_records) id, value  
    FROM dbo.TestTable  
    ORDER BY id 
)        

--deleting the data 
DELETE FROM dbo.TestUDF2(10001)  

    And, as closure, the example based on the common table expression:

 -- deletion using a CTE 
;WITH CTE  
AS ( 
     SELECT top(10000) id 
     FROM dbo.TestTable  
     ORDER BY ID 
)  
DELETE FROM CTE 

30 January 2011

🛠️ Resources: SQL Server

During the past weeks I found several interesting learning resources on SQL Server 2008 topics, so here they are:

For those interest to upgrade their knowledge to SQL Server 2008 R2 Edition check the Microsoft SQL Server 2008 R2 Update for Developers Training Course videos on Channel9. They cover topics like Location Awareness, CLR Integration, StreamInsight, Reporting Services, Application and Multi-Server Management, improvements on BLOB, T-SQL and tools for Excel 2010 or SharePoint 2010.

The ER diagrams of SQL Server’s DMVs or DMFs are available for download as PDF at SQL Server 2008 Systems Views Map, and the same for SQL Server 2005, respectively SQL Server 2008 R2. Even if not all objects and possible relations are shown, they could still save you from lot of effort as the diagram depicts the relations between the most important entities and the cardinality existing between them, plus the additional constraints participating in the joins. The entities represented: Objects, Types and Indexes, Linked Servers, CLR, Database Mirroring, Service Broker, Resource Governor, Transactions, Databases and Storage, Traces and Events, Execution Environment, Server information.

There are several free SQL Server books on SQLServerCentral.com coming from Redgate: A. Kuznetsov’s Defensive Database Programming, B. McGehee’s SQL Server Maintenance Plans, How to become an Exceptional DBA, Mastering SQL Server Profiler and Brad’s Sure Guide to SQL Server 2008, J. Magnabosco’s Protecting SQL Server Data, G. Fritchey’s SQL Server Execution Plans and SQL Server Crib Sheet Compendium (coauthor with A. Prasanna), R. Landrum’s SQL Server Tacklebox, plus several volumes of ‘Best of SQL Server Central’ authors in Vol. 1-4, Vol. 5, Vol. 6 and Vol. 7, and several other books to come.

It seems there are a few new books also on Scribd website: R. Colledge’s SQL Server Administration in Action, R. Vieira’s Professional Microsoft SQL Server 2008 Programming, M. Lee and M. Mansfield’s SQL Server 2008 Administration Instant Reference, plus many other books that can be retrieved by searching SQL Server 2008 on the respective site.


There are several papers coming from Microsoft and several professionals. It worth to check them:
-   K. Lassen’s paper on Using Management Data Warehouse for Performance Monitoring, in which he provides several best practices for performance management of SQL Server 2008. Except the introduction in Management Data Warehouse features, he provides also an extension of the built-in functionality for indexes, plus several functions and queries.
-  D. Kiely's paper on SQL Server 2008 R2 Security Overview for Database Administrators covers some of the most important security features in SQL Server 2008. There is a similar paper targeting SQL Server 2005.
-  B. Beauchemin’s paper on SQL Server 2005 Security Best Practices - Operational and Administrative Tasks describes the best practices for setting up and maintaining security in SQL Server 2005.
-  SQL Server 2008 Compliance Guide a paper written to help organizations and individuals understand how to use the features of the Microsoft® SQL Server® 2008 database software to address their compliance need.
-  SQL Server 2008 Upgrade Technical Reference Guide essential phases, steps and best practices to upgrade existing instances of SQL Server 2000 and 2005 to SQL Server 2008.
-  Technet Wiki Checklists on Database Engine Security, Database Engine Security, Database Engine Connections and  Data Access. (probably more to come)
-   Also Microsoft maintains a list of White Papers on SQL Server 2008, respectively SQL Server 2005.

07 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server IV (Spaces, Trimming, Length and Comparisons)

    In the previous post on concatenation, I was talking about the importance of spaces and other delimiters in making concatenations’ output more “readable”. Excepting their importance in natural language, the spaces have some further implication in the way strings are stored and processed. As remarked in the introductory post from this topic, there are two types of spaces that stand out in the crowds of spaces, namely the trailing spaces, the spaces found at the right extremity of a string,  respectively the leading spaces, the spaces found at the left extremity of a string. Are few the cases when the two trailing space are of any use, therefore databases like SQL Server usually ignore them. The philosophy about leading space is slightly different because there are cases in which they are used in order to align the text to the right, however there are tools which are cutting off the leading spaces. When no such tools are available or any of the two types of spaces are not cut off, then we’ll have do to it ourselves, and here we come to the first topic of this post, trimming.

Trimming

   Trimming is the operation of removing the empty spaces found at the endings of a string. Unlike other programming languages which use only one function for this purpose (e.g. Trim function in VB or Oracle), SQL Server makes use of two functions used for this purpose, LTrim used to trim the spaces found at the left ending of the string, respectively RTrim, used to trim the spaces found at the right ending of the string.

-- trimming a string 
SELECT  LTrim(' this is a string ') Length1 -- left trimming 
, RTrim(' this is a string ') Length2 --right trimming 
, LTrim(RTrim(' this is a string ')) Length2 --left & right trimming 

    As can be seen it’s not so easy to identify the differences, maybe the next function will help to see that there is actually a difference.

Note:
    If it looks like the two trimming functions are not working with strings having leading or trailing spaces, then maybe you are not dealing with an empty character but rather with other characters like CR, LF, CRLF or other similar characters, rendered sometimes like an empty character.

Length

   Before approaching other operations with strings, it’s maybe useful (actually necessary as we will see) to get a glimpse of the way we can determine the length of a string value, in other words how many characters it has, this being possible by using the Len function:

-- length of a string 
SELECT Len('this is a string') Length1 -- simple string 
, Len('this is a string ') Length2 --ending in space 
, Len(' this is a string') Length3 --starting with a space 
, Len(' this is a string ') Length4 --starting &amp; ending with a space 
, Len(LTrim(' this is a string ')) Length5 length & left trimming 
,Len(RTrim(' this is a string ')) Length5 length & right trimming 
,Len(LTrim(RTrim(' this is a string '))) Length5 length, left & right trimming    

In order to understand the above results, one observation is necessary: if a strings ends in with one or more empty characters, the Len function ignores them, though this doesn’t happen with the leading empty characters, they needing to be removed explicitly if needed.

Comparisons

    The comparison operation points the differences or similarities existing between two data types, involving at minimum two expressions that reduce at runtime to a data type and the comparison operator. This means that each member of comparison could include any valid combinations of functions as long they are reduced to compatible data types. In what concerns the comparison of strings, things are relatively simple, the comparison being allowed  independently on whether they have fix or varying length. Relatively simple because if we’d have to go into details, then we’d need to talk about character sets (also called character encoding or character maps) and other string goodies the ANSI SQL standard(s) are coming with, including a set of rules that dictate the behavior of comparisons. So, let’s keep things as simple as possible. As per above attempt of definition, a comparison implies typically an equality, respectively difference, based on equal (“=”), respectively not equal (“<>” or “!=”). Here are some simple examples:

-- sample comparisons 
SELECT CASE WHEN 'abc' != 'abc ' THEN 1 ELSE 0 END Example1 
, CASE WHEN ' abc' != 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN ' ' != '' THEN 1 ELSE 0 END Example3 
-- error comparison , CASE WHEN 'abc' != NULL THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' = NULL THEN 1 ELSE 0 END Example5 
-- adequate NULL comparison , CASE WHEN 'abc' IS NOT NULL THEN 1 ELSE 0 END Example6  
, CASE WHEN 'abc' IS NULL THEN 1 ELSE 0 END Example7 

comparisons - set 1

  The first three examples are demonstrating again the behavior of leading, respectively trailing spaces. The next two examples, even if they seem quite logical in terms of natural language semantics, they are wrong from the point of view of SQL semantics, and this because the comparison of values in which one of them is NULL equates to a NULL, thus resulting the above behavior in which both expressions from the 4th and 5th example equate to false. The next two examples show how the NULLs should be handled in comparisons with the help of IS operator, respectively it’s negation – IS NOT. 

   Like in the case of numeric values, the comparison between two strings could be expressed by using the “less than” (“<;”) and “greater than” (“?”) operators, alone or in combination with the equality operator (“<=”, “>=”) or the negation operator (“!>”, “<!”) (see comparison operators in MDSN). Typically an SQL Server database is case insensitive, so there  will be no difference between the following strings: “ABC”, “abc”, “Abc”, etc. Here are some examples:

-- sample comparisons (case sensitive) 
SELECT CASE WHEN 'abc' < 'ABC' THEN 1 ELSE 0 END Example1 
, CASE WHEN 'abc' > 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN 'abc' >= 'abc ' THEN 1 ELSE 0 END Example3 
, CASE WHEN 'abc' <> 'ABC' THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' > '' THEN 1 ELSE 0 END Example5 
, CASE WHEN ' ' > '' THEN 1 ELSE 0 END Example6 

comparisons - case insensitiveness

    The case sensitivity could be changed at attribute, table or database level. As we don’t deal with a table and the don’t want to complicate too much the queries, let’s consider changing the sensitivity at database level. So if you are using a non-production database, try the following script in order to enable, respectively to disable the case sensitivity:

--enabling case sensitivity for a database 
ALTER DATABASE <database name>  
COLLATE Latin1_General_CS_AS  

--disabling case sensitivity for a database 
ALTER DATABASE <database name> 
COLLATE Latin1_General_CI_AS 
 
    In order to test the behavior of case sensitivity, enable first the sensitivity and then rerun the previous set of example (involving case sensitivity).

comparisons - case sensitiveness

    After that you could disable again the case sensitivity by running the last script. Please note that if your database has other collation, you’ll have to change the scripts accordingly in order to point to your database’s collation.

03 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server V (Character Indexes)

A string is in fact a concatenation or chain of characters, each character in the string having a position referred as character’s index. In programming it’s useful to identify the first occurrence of a character or string in forward or backwards order, with a few simple tricks being possible to identify all the occurrences of the characters or whole strings in a string. Such functions are appearing in some of the well known programming languages under names like InStr, IndexOf for forward processing, respectively, InStrRev, LastIndexOf. SQL Server has only one function for this purpose, namely CharIndex, used to retrieve the position where a character is found starting from a given position. If no occurrence was found, the function returns 0, while if no start position if given, the default is 0. There are several other remarks that could be found in MSDN, and it’s actually always a good idea to consult the documentation from time to time, not only when meeting a new function. But let’s get back to the CharIndex function with a few examples based on searching a character within a given string:

-- searching a character into a string 
DECLARE @string varchar(50)  
SET @string = 'this is a string'  
SELECT CharIndex(' ', @string) Example1  
, CharIndex(NULL, @string) Example2 
, CharIndex(NULL, @string, NULL) Example3 
, CharIndex(NULL, NULL, NULL) Example4 
Output:
Example1 Example2 Example3 Example4
5 NULL NULL NULL

If in the first example the function is used to retrieve the first occurrence of a space in the given string, in the last three examples is highlighted the behavior of the function in case one of the parameters is NULL, in such cases the function returning a NULL – this means that when other behavior is expected, it falls in your attributions to handle the NULL values accordingly.

Here’s a second set of examples based on the search of a string within another string. As can be seen there isn’t a difference at all in how the function is applied:

-- searching a string into a string
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_is_another_string'  
SET @search = 'is'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- occurring position  
, CharIndex(@search, @string, 5) Example3 -- second occurring position 
Output:
Example1 Example2 Example3
this_is_another_string 3 6

From the three examples, the third deserves some special attention because it attempts to retrieve the second occurrence of the string by using a hardcoded starting value. In fact the search could start a position further from the first occurrence, something like in the below example:

-- searching a string into a string in foward/reverse order 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- 1st occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string)+1) Example3 -- 2nd occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string, CharIndex(@search, @string)+1)+1) Example4 -- 3rd 
occuring position (forward)
Output:
Example1 Example2 Example3 Example4
this_was_a_string 5 9 11

The logic could be repeated over and over again in order to retrieve the n-th position, list all the positions of occurrence or the total number of occurrences.

Even if SQL Server doesn’t provide a function for retrieving the first occurrence in backward order, in other words starting from the end to the start, there is actually a simple trick that might do it. As SQL Server provides a Reverse function, which reverses the value of the string provided as parameter.

-- 1st occurring position backwards 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT Len(@string) - CharIndex(@search, Reverse(@string))+1 Example 

Unfortunately the logic needed to retrieve the second and following occurrences starting from a given position it’s not so easy readable. Therefore, for the sake of “simplicity”, here’s the logic for reverse processing incorporated in CharIndexRev function:

-- reverse CharIndex 
CREATE FUNCTION dbo.CharIndexRev( 
@search varchar(50) 
, @string varchar(500) 
, @start int = 0) 
RETURNS int 
AS BEGIN 
RETURN Len(@string) - CharIndex(@search, Reverse(@string), CASE WHEN IsNull(@start, 0) != 0 THEN Len(@string) - @start+1 ELSE 0 END)+1 
END 

And here’s the function at work:
 
-- searching a string into a string in reverse order 
DECLARE @string varchar(50) 
DECLARE @search varchar(50) 
SET @string = 'this_was_a_string' 
SET @search = '_' 
SELECT dbo.CharIndexRev(NULL, @string, 0) Example1 
, dbo.CharIndexRev(@search, NULL, 0) Example2 
, dbo.CharIndexRev(NULL, NULL, 0) Example3 
, dbo.CharIndexRev(@search, @string, 0) Example4 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1) Example5 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1)-1) Example6 
Output:
Example1 Example2 Example3 Example4 Example5 Example6
NULL NULL NULL 11 9 5

As can be seen, the output of the last three examples matches the reversed output of the previous set (11, 9, 5) vs. (5, 9, 11), while the first three examples exemplify the behavior of the function when one of the parameters is Null.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.