Showing posts with label CharIndex. Show all posts
Showing posts with label CharIndex. Show all posts

21 May 2020

💎🏭SQL Reloaded: Functions Useful in Data Migrations

Left and Right-Padding

Oracle has the LPAD and RPAD functions which return and expression, left-padded, respectively right-padded with the specified characters. The functions are useful in formatting unique identifiers to fit a certain format (e.g. 0000012345 instead of 12345). Here’re similar implementations of the respective functions:

-- left padding function 
CREATE FUNCTION dbo.LeftPadding( 
  @str varchar(50) 
, @length int  
, @padchar varchar(1))
RETURNS varchar(50)
AS 
BEGIN 
  RETURN CASE  
    WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
    ELSE @str  
    END 
END  

-- example left padding 
SELECT dbo.LeftPadding('12345', '10', '0')

-- right padding function 
CREATE FUNCTION dbo.RightPadding( 
  @str varchar(50) 
, @length int  
, @padchar varchar(1))
RETURNS varchar(50)
AS 
BEGIN 
  RETURN CASE  
    WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
	ELSE @str  
  END
END  

-- example right padding 
SELECT dbo.RightPadding('12345', '10', '0')

Left and Right Side

When multiple pieces of data are stored within same same attribute, it’s helpful to get the left, respectively the right part of the string based on a given delimiter, where the reverse flag tells the directions in which the left is applied:

-- left part function 
CREATE FUNCTION dbo.CutLeft( 
  @str varchar(max) 
, @delimiter varchar(1)
, @reverse bit = 0)
RETURNS varchar(max)
AS 
BEGIN 
  RETURN CASE  
     WHEN CharIndex(@delimiter, @str)>0 THEN 
       CASE @reverse 
         WHEN 0 THEN Left(@str, CharIndex(@delimiter, @str)-1)    
         ELSE Left(@str, Len(@str)-CharIndex(@delimiter, Reverse(@str))) 
       END
	 ELSE @str  
  END
END  

-- example left part 
SELECT dbo.CutLeft('12345,045,000', ',', 0)
, dbo.CutLeft('12345,045,000', ',', 1)



-- right part function 
CREATE FUNCTION dbo.CutRight( 
  @str varchar(max) 
, @delimiter varchar(1)
, @reverse bit = 0)
RETURNS varchar(max)
AS 
BEGIN 
  RETURN CASE  
    WHEN CharIndex(@delimiter, @str)>0 THEN 
      CASE @reverse 
        WHEN 0 THEN Right(@str, CharIndex(@delimiter, Reverse(@str))-1) 
        ELSE Right(@str, Len(@str)-CharIndex(@delimiter, @str)) 
      END
   ELSE @str  
  END
END  

-- example right part 
SELECT dbo.CutRight('12345,045,000', ',', 0)
, dbo.CutRight('12345,045,000', ',', 1)

Replacing Special Characters

Special characters can prove to be undesirable in certain scenarios (e.g. matching values, searching). Until a “Replace from” function will be made available, the solution is to include the replacements in a user-defined function similar with the one below:

DROP FUNCTION IF EXISTS [dbo].[ReplaceSpecialChars]

CREATE FUNCTION [dbo].[ReplaceSpecialChars](
@string nvarchar(max)
, @replacer as nvarchar(1) = '-'
) RETURNS nvarchar(max)
-- replaces special characters with a given character (e.g. an empty string, space)
AS
BEGIN   
  IF CharIndex('*', @string) > 0  
     SET @string = replace(@string, '*', @replacer)	
    
  IF CharIndex('#', @string) > 0  
     SET @string = replace(@string, '#', @replacer)	
    
  IF CharIndex('$', @string) > 0  
     SET @string = replace(@string, '$', @replacer)	
    
  IF CharIndex('%', @string) > 0  
     SET @string = replace(@string, '%', @replacer)	
    
  IF CharIndex('&', @string) > 0  
     SET @string = replace(@string, '&', @replacer)	
    
  IF CharIndex(';', @string) > 0  
     SET @string = replace(@string, ';', @replacer)	
    
  IF CharIndex('/', @string) > 0  
     SET @string = replace(@string, '/', @replacer)	
    
  IF CharIndex('?', @string) > 0  
     SET @string = replace(@string, '?', @replacer)	
    
  IF CharIndex('\', @string) > 0  
     SET @string = replace(@string, '\', @replacer)	
    
  IF CharIndex('(', @string) > 0  
     SET @string = replace(@string, '(', @replacer)	
    
  IF CharIndex(')', @string) > 0  
     SET @string = replace(@string, ')', @replacer)	
    
  IF CharIndex('|', @string) > 0  
     SET @string = replace(@string, '|', @replacer)	
    
  IF CharIndex('{', @string) > 0  
     SET @string = replace(@string, '{', @replacer)	
    
  IF CharIndex('}', @string) > 0  
     SET @string = replace(@string, '}', @replacer)	
    
  IF CharIndex('[', @string) > 0  
     SET @string = replace(@string, '[', @replacer)	
    
  IF CharIndex(']', @string) > 0  
     SET @string = replace(@string, ']', @replacer)	
                                
  RETURN (LTrim(RTrim(@string)))
END

SELECT [dbo].[ReplaceSpecialChars]('1*2#3$4%5&6;7/8?9\10(11)12|13{14}15[16]', '')
SELECT [dbo].[ReplaceSpecialChars]('1*2#3$4%5&6;7/8?9\10(11)12|13{14}15[16]', ' ')

Other type of special characters are the umlauts (e.g. ä, ß, ö, ü from German language):

DROP FUNCTION IF EXISTS [dbo].[ReplaceUmlauts]

CREATE FUNCTION [dbo].[ReplaceUmlauts](
@string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces umlauts with their equivalent
AS
BEGIN   
  IF CharIndex('ä', @string) > 0  
     SET @string = replace(@string, 'ä', 'ae')	
    
  IF CharIndex('ö', @string) > 0  
     SET @string = replace(@string, 'ö', 'oe')	
    
  IF CharIndex('ß', @string) > 0  
     SET @string = replace(@string, 'ß', 'ss')	
    
  IF CharIndex('%', @string) > 0  
     SET @string = replace(@string, 'ü', 'ue')	
    
                                
  RETURN Trim(@string)
END

SELECT [dbo].[ReplaceUmlauts]('Hr Schrötter trinkt ein heißes Getränk')

Handling Umlauts

Another type of specials characters are the letter specific to certain languages that deviate from the Latin characters (e.g. umlauts in German, accents in French), the usage of such characters introducing further challenges in handling the characters, especially when converting the data between characters sets. A common scenario is the one in which umlauts in ISO-8891-1 are encoded using two character sets. Probably the easiest way to handle such characters is to write a function as follows:

-- create the function 
CREATE FUNCTION [dbo].[ReplaceCodes2Umlauts](
  @string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces ISO 8859-1 characters with the corresponding encoding 
AS
BEGIN   
  IF CharIndex('ß', @string) > 0  
     SET @string = replace(@string, 'ß', 'ß')	

  IF CharIndex('ö', @string) > 0  
     SET @string = replace(@string, 'ö', 'ö')	

  IF CharIndex('Ö', @string) > 0  
     SET @string = replace(@string, 'Ö', 'Ö')	
    
  IF CharIndex('ü', @string) > 0  
     SET @string = replace(@string, 'ü', 'ü')	

  IF CharIndex('Ü', @string) > 0  
     SET @string = replace(@string, 'Ãœ', 'Ü')	

  IF CharIndex('ä', @string) > 0  
     SET @string = replace(@string, 'ä', 'ä')

  IF CharIndex('Ä', @string) > 0  
     SET @string = replace(@string, 'Ä', 'Ä')
                                
  RETURN (LTrim(RTrim(@string)))
END


--test the function 
SELECT [dbo].[ReplaceCodes2Umlauts]('Falsches Ãœben von Xylophonmusik quält jeden größeren Zwerg')

In the inverse scenario, at least for the German language is possible to replace the umlauts with a set the corresponding transliterations:

-- drop the function 
DROP FUNCTION IF EXISTS [dbo].[ReplaceUmlauts]

-- create the function 
CREATE FUNCTION [dbo].[ReplaceUmlauts](
  @string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces umlauts with corresponding transliterations 
AS
BEGIN   
  IF CharIndex('ß', @string) > 0  
     SET @string = replace(@string, 'ß', 'ss')	

  IF CharIndex('ü', @string) > 0  
     SET @string = replace(@string, 'ü', 'ue')	

  IF CharIndex('ä', @string) > 0  
     SET @string = replace(@string, 'ä', 'ae')	
    
  IF CharIndex('ö', @string) > 0  
     SET @string = replace(@string, 'ö', 'oe')	
                                
  RETURN (LTrim(RTrim(@string)))
END

--test the function 
SELECT [dbo].[ReplaceUmlauts]('Falsches üben von Xylophonmusik quält jeden größeren Zwerg')

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

24 February 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VI (Subparts of a String)

No matter how normalized a database is, there will always be the need to encode multiple semantic entities in a string, needing thus to extract them later. For example data like the Street Name and Number, or the Country Phone Prefix and Phone Number, the First and the Last Name, etc. Another frequent scenario is the misuse of a long-sized string data types to store multiple delimited attributes or simply a whole text like a Comment or Description. The easiest scenarios to deal with are the ones when you know the rules behind your encoding, preferably dealing with a fix length encoding or only a given delimiter. An example of fix length encoding is the IBAN Number, the country specific VAT Number or any other artificial constructed/standardized encoding. According to Wikipedia, IBAN (International Bank Account) Number consists of  ISO 3166-1 alpha-2 country code, followed by two check digits that are calculated using a mod-97 technique, and Basic Bank Account Number (BBAN) with up to thirty alphanumeric characters. Taking the IBAN example provided for Switzerland, it could be stored as “CH93 0076 2011 6238 5295 7” or “CH9300762011623852957”, in either case when the country is not provided explicitly it would be interesting to extract it from the IBAN together with the BBAN. How would we do that?

Many of the programming languages I worked with provide a function for extracting a substring from a string - Mid (VB/VBScript), Substring in C# Substr in Oracle, etc. SQL Server is actually providing three functions for this purpose: Substring, Left, respectively Right. The Substring function extracts a substring of a given length starting from a given position. The Left and Right functions return, as their name suggests, the left part, respectively the right part of a character string with the specified number of characters. The use of Left and Right functions seems to be redundant, as they are a particular case of Substring, however they can simplify sometimes the code, as can be seen from below example based on the above IBAN example.

-- parsing a VAT Number 
DECLARE @VAT varchar(50) 
SET @VAT = 'CH9300762011623852957' 
SELECT @VAT VATNumber 
, LEFT(@VAT, 2) CountryCode1 
, SUBSTRING(@VAT, 1, 2) CountryCode2 
, SUBSTRING(@VAT, 3, 2) CheckDigits 
, RIGHT(@VAT, Len(@VAT)-4) BBAN1 
, SUBSTRING(@VAT, 5, Len(@VAT)-4) BBAN2 
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 

Substrings Example 1

Even if the IBAN has a variable-length component (the BBAN) given the fact that the other two components are fixed, this allows us to clearly extract each component. The example shows also the equivalent call of Substring function for Left (Country Code extraction), respectively Right (BBAN extraction).

What happens if there are more than one variable-length components? For such scenarios it’s useful to introduce a delimiter, it could be a comma, a dash, space, etc. It’s the case of a First and Last Name stored in the same attribute. Normally only one component qualifies as Last Name, and for simplicity let’s consider it as being stored first and space as delimiter. In order to identify the components, it’s enough to identify the first occurrence of the delimiter by using the CharIndex function.
 
-- parsing a Person's Name 
DECLARE @Name varchar(50) 
SET @Name = 'Stone Sharon' 
SELECT @Name Name 
, LEFT(@Name, CHARINDEX(' ', @Name)) LastName 
, RIGHT(@Name, LEN(@Name) - CHARINDEX(' ', @Name)) FirstName 
Output:
Name LastName FirstName
Stone Sharon Stone Sharon

The code for cutting the left, respectively right part of a string is pretty simple and over the years I used it quite often, so it makes sense to encapsulate it in a function., like I did in an older post.

When a delimiter is used repeatedly in a string, normally we need to identify each component in the string. Sometimes the number of components can be given, other times not. For this purpose can be used a common table expression, and here is another example in which the space is used as delimiter – extracting the words from a given sentence.

-- extracting the words from a sentence 
DECLARE @str nvarchar(100) 
SET @str = 'This is just a test' 
;WITH CTE (PrevString, Position, Word) 
AS ( 
    SELECT LTrim(RTrim( CASE  
         WHEN CharIndex(' ', @str)>0 THEN Right(@str, Len(@str)-CharIndex(' ', @str)) 
         ELSE '' 
     END)) PrevString 
, 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', @str)>0 THEN LEFT(@str, CharIndex(' ', @str)) 
     ELSE @str 
END)) Word 
UNION ALL  
SELECT LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(' ', PrevString)) 
     ELSE '' 
END)) PrevString 
, Position + 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN LEFT(PrevString, CharIndex(' ', PrevString)) 
     ELSE PrevString 
END)) Word 
FROM CTE  
WHERE Len(PrevString)>0 
)  
SELECT PrevString 
, Word  
, Position 
FROM CTE 
OPTION (maxrecursion 100)
Output
PrevString Word Position
is just a test This 1
just a test is 2
a test just 3
test a 4
test 5

The logic works for a sentence, and if we ignore the fact that some punctuation signs are appearing at the end of the words, it might work  as well for a whole phrase, considering that the punctuation signs can be replaced from the end result. It would be useful for example to generalize the logic for a set of delimiters, in this case the other punctuation signs (e.g. “,”, “;”, “!”, etc.), however this would mean to identify which of the delimiters is used first or to apply the the same logic for the first delimiter, then for the second and so on. In addition, if the number of encoded elements within a value remain the same, a pivot can be applied on the final result and have thus all values’ elements within the same row.

Happy Coding!   

03 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server V (Character Indexes)

A string is in fact a concatenation or chain of characters, each character in the string having a position referred as character’s index. In programming it’s useful to identify the first occurrence of a character or string in forward or backwards order, with a few simple tricks being possible to identify all the occurrences of the characters or whole strings in a string. Such functions are appearing in some of the well known programming languages under names like InStr, IndexOf for forward processing, respectively, InStrRev, LastIndexOf. SQL Server has only one function for this purpose, namely CharIndex, used to retrieve the position where a character is found starting from a given position. If no occurrence was found, the function returns 0, while if no start position if given, the default is 0. There are several other remarks that could be found in MSDN, and it’s actually always a good idea to consult the documentation from time to time, not only when meeting a new function. But let’s get back to the CharIndex function with a few examples based on searching a character within a given string:

-- searching a character into a string 
DECLARE @string varchar(50)  
SET @string = 'this is a string'  
SELECT CharIndex(' ', @string) Example1  
, CharIndex(NULL, @string) Example2 
, CharIndex(NULL, @string, NULL) Example3 
, CharIndex(NULL, NULL, NULL) Example4 
Output:
Example1 Example2 Example3 Example4
5 NULL NULL NULL

If in the first example the function is used to retrieve the first occurrence of a space in the given string, in the last three examples is highlighted the behavior of the function in case one of the parameters is NULL, in such cases the function returning a NULL – this means that when other behavior is expected, it falls in your attributions to handle the NULL values accordingly.

Here’s a second set of examples based on the search of a string within another string. As can be seen there isn’t a difference at all in how the function is applied:

-- searching a string into a string
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_is_another_string'  
SET @search = 'is'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- occurring position  
, CharIndex(@search, @string, 5) Example3 -- second occurring position 
Output:
Example1 Example2 Example3
this_is_another_string 3 6

From the three examples, the third deserves some special attention because it attempts to retrieve the second occurrence of the string by using a hardcoded starting value. In fact the search could start a position further from the first occurrence, something like in the below example:

-- searching a string into a string in foward/reverse order 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- 1st occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string)+1) Example3 -- 2nd occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string, CharIndex(@search, @string)+1)+1) Example4 -- 3rd 
occuring position (forward)
Output:
Example1 Example2 Example3 Example4
this_was_a_string 5 9 11

The logic could be repeated over and over again in order to retrieve the n-th position, list all the positions of occurrence or the total number of occurrences.

Even if SQL Server doesn’t provide a function for retrieving the first occurrence in backward order, in other words starting from the end to the start, there is actually a simple trick that might do it. As SQL Server provides a Reverse function, which reverses the value of the string provided as parameter.

-- 1st occurring position backwards 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT Len(@string) - CharIndex(@search, Reverse(@string))+1 Example 

Unfortunately the logic needed to retrieve the second and following occurrences starting from a given position it’s not so easy readable. Therefore, for the sake of “simplicity”, here’s the logic for reverse processing incorporated in CharIndexRev function:

-- reverse CharIndex 
CREATE FUNCTION dbo.CharIndexRev( 
@search varchar(50) 
, @string varchar(500) 
, @start int = 0) 
RETURNS int 
AS BEGIN 
RETURN Len(@string) - CharIndex(@search, Reverse(@string), CASE WHEN IsNull(@start, 0) != 0 THEN Len(@string) - @start+1 ELSE 0 END)+1 
END 

And here’s the function at work:
 
-- searching a string into a string in reverse order 
DECLARE @string varchar(50) 
DECLARE @search varchar(50) 
SET @string = 'this_was_a_string' 
SET @search = '_' 
SELECT dbo.CharIndexRev(NULL, @string, 0) Example1 
, dbo.CharIndexRev(@search, NULL, 0) Example2 
, dbo.CharIndexRev(NULL, NULL, 0) Example3 
, dbo.CharIndexRev(@search, @string, 0) Example4 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1) Example5 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1)-1) Example6 
Output:
Example1 Example2 Example3 Example4 Example5 Example6
NULL NULL NULL 11 9 5

As can be seen, the output of the last three examples matches the reversed output of the previous set (11, 9, 5) vs. (5, 9, 11), while the first three examples exemplify the behavior of the function when one of the parameters is Null.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.