Showing posts with label spaces. Show all posts
Showing posts with label spaces. Show all posts

07 January 2011

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IV (Spaces, Trimming, Length and Comparisons)

In the previous post on concatenation, I was talking about the importance of spaces and other delimiters in making concatenations’ output more “readable”. Excepting their importance in natural language, the spaces have some further implication in the way strings are stored and processed. As remarked in the introductory post from this topic, there are two types of spaces that stand out in the crowds of spaces, namely the trailing spaces, the spaces found at the right extremity of a string,  respectively the leading spaces, the spaces found at the left extremity of a string. 

Are few the cases when the two trailing space are of any use, therefore databases like SQL Server usually ignore them. The philosophy about leading space is slightly different because there are cases in which they are used in order to align the text to the right, however there are tools which are cutting off the leading spaces. When no such tools are available or any of the two types of spaces are not cut off, then we’ll have do to it ourselves, and here we come to the first topic of this post, trimming.

Trimming

Trimming is the operation of removing the empty spaces found at the endings of a string. Unlike other programming languages which use only one function for this purpose (e.g. Trim function in VB or Oracle), SQL Server makes use of two functions used for this purpose, LTrim used to trim the spaces found at the left ending of the string, respectively RTrim, used to trim the spaces found at the right ending of the string.

-- trimming a string 
SELECT  LTrim(' this is a string ') Length1 -- left trimming 
, RTrim(' this is a string ') Length2 --right trimming 
, LTrim(RTrim(' this is a string ')) Length2 --left & right trimming 

As can be seen it’s not so easy to identify the differences, maybe the next function will help to see that there is actually a difference.

Note:
1) If it looks like the two trimming functions are not working with strings having leading or trailing spaces, then maybe you are not dealing with an empty character but rather with other characters like CR, LF, CRLF or other similar characters, rendered sometimes like an empty character.
2)   In SQL Server 2017 was introduced the Trim function which not only replaces the combined use of LTrim and RTrim functions, but it allows to replace other specified characters (including CR, LF, Tab) from the start or end of a string. (see post

Length

Before approaching other operations with strings, it’s maybe useful (actually necessary as we will see) to get a glimpse of the way we can determine the length of a string value, in other words how many characters it has, this being possible by using the Len function:

-- length of a string 
SELECT Len('this is a string') Length1 -- simple string 
, Len('this is a string ') Length2 --ending in space 
, Len(' this is a string') Length3 --starting with a space 
, Len(' this is a string ') Length4 --starting & ending with a space 
, Len(LTrim(' this is a string ')) Length5 --length & left trimming 
,Len(RTrim(' this is a string ')) Length5 --length & right trimming
,Len(LTrim(RTrim(' this is a string '))) Length5 --length, left & right trimming    

In order to understand the above results, one observation is necessary: if a strings ends in with one or more empty characters, the Len function ignores them, though this doesn’t happen with the leading empty characters, they needing to be removed explicitly if needed.

Comparisons

The comparison operation points the differences or similarities existing between two data types, involving at minimum two expressions that reduce at runtime to a data type and the comparison operator. This means that each member of comparison could include any valid combinations of functions as long they are reduced to compatible data types. In what concerns the comparison of strings, things are relatively simple, the comparison being allowed  independently on whether they have fix or varying length. Relatively simple because if we’d have to go into details, then we’d need to talk about character sets (also called character encoding or character maps) and other string goodies the ANSI SQL standard(s) are coming with, including a set of rules that dictate the behavior of comparisons. So, let’s keep things as simple as possible. As per above attempt of definition, a comparison implies typically an equality, respectively difference, based on equal (“=”), respectively not equal (“<>” or “!=”). Here are some simple examples:

-- sample comparisons 
SELECT CASE WHEN 'abc' != 'abc ' THEN 1 ELSE 0 END Example1 
, CASE WHEN ' abc' != 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN ' ' != '' THEN 1 ELSE 0 END Example3 
-- error comparison , CASE WHEN 'abc' != NULL THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' = NULL THEN 1 ELSE 0 END Example5 
-- adequate NULL comparison , CASE WHEN 'abc' IS NOT NULL THEN 1 ELSE 0 END Example6  
, CASE WHEN 'abc' IS NULL THEN 1 ELSE 0 END Example7 
Output:
Example1 Example2 Example3 Example5 Example7
0 1 0 0 0

The first three examples are demonstrating again the behavior of leading, respectively trailing spaces. The next two examples, even if they seem quite logical in terms of natural language semantics, they are wrong from the point of view of SQL semantics, and this because the comparison of values in which one of them is NULL equates to a NULL, thus resulting the above behavior in which both expressions from the 4th and 5th example equate to false. The next two examples show how the NULLs should be handled in comparisons with the help of IS operator, respectively it’s negation – IS NOT. 

 Like in the case of numeric values, the comparison between two strings could be expressed by using the “less than” (“<;”) and “greater than” (“?”) operators, alone or in combination with the equality operator (“<=”, “>=”) or the negation operator (“!>”, “<!”) (see comparison operators in MDSN). Typically an SQL Server database is case insensitive, so there  will be no difference between the following strings: “ABC”, “abc”, “Abc”, etc. Here are some examples:

-- sample comparisons (case sensitive) 
SELECT CASE WHEN 'abc' < 'ABC' THEN 1 ELSE 0 END Example1 
, CASE WHEN 'abc' > 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN 'abc' >= 'abc ' THEN 1 ELSE 0 END Example3 
, CASE WHEN 'abc' <> 'ABC' THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' > '' THEN 1 ELSE 0 END Example5 
, CASE WHEN ' ' > '' THEN 1 ELSE 0 END Example6 
Output:
Example1 Example2 Example3 Example4 Example5 Example6
0 0 1 0 1 0


The case sensitivity could be changed at attribute, table or database level. As we don’t deal with a table and the don’t want to complicate too much the queries, let’s consider changing the sensitivity at database level. So if you are using a non-production database, try the following script in order to enable, respectively to disable the case sensitivity:

--enabling case sensitivity for a database 
ALTER DATABASE <database name>  
COLLATE Latin1_General_CS_AS  

--disabling case sensitivity for a database 
ALTER DATABASE <database name> 
COLLATE Latin1_General_CI_AS 
 
In order to test the behavior of case sensitivity, enable first the sensitivity and then rerun the previous set of example (involving case sensitivity).
Output:
Example1 Example2 Example3 Example4 Example5 Example6
1 0 1 1 1 0
After that you could disable again the case sensitivity by running the last script. Please note that if your database has other collation, you’ll have to change the scripts accordingly in order to point to your database’s collation.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

It doesn’t make sense to talk about the creation of string data types without talking about their “selection” as people typically want to see also examples at work, and not only learn about theoretical facts. I’m saying that because often programming seems to be like putting together the pieces of a puzzle without knowing how the final image would look like. Anyway, coming back to the topic, what I find great about SQL Server is that it allows to “create” and select a string with a minimum overhead within a simple select statement:

-- selecting a string value 
SELECT 'this is a string' -- string 1 
, 'and this is a second string' --string2 

-- selecting a string value (named columns)  
SELECT 'this is a string' String1  
, 'and this is a second string' String2 
 
Strings - example1

Pretty simple, isn’t it? This kind of scripts could be useful especially when debugging logic or testing an expression (e.g. functions at work), the advantage residing in the fact that is not necessary to built a table in order to test simple things. When performing multiple operations with the same values could be handy to store the values in declared variables:  

-- selecting string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 char(50) 
DECLARE @string3 char(50) = 'this is a string'  

SET @string1 = 'this is a string'  

SELECT @string1 String1 , @string2 String2 
, @string3 String3 
   
Strings - example2

As can be seen a not initialized variable has by default the value NULL, fact that could lead to unexpected behavior problems when used in expressions involving multiple variables. Therefore it’s recommended to initialize the values with the empty string or at least to handle the nulls in expressions. Starting with SQL Server 2008 it’s possible to declare and initialize variables within the same statement, so the above code won’t work in previous versions, unless the third declaration is broken into two pieces as per definition of first string.

It looks like that’s all, at least in what concerns the direct declaration of a string, be it implicit or explicit. However a basic introduction into strings’ creation is incomplete without mentioning the creation of strings from other data types and the various functions that could be used for this purpose. Ignoring the Cast and Convert functions used to explicitly convert the other data types to string data types, there are several other functions for this purpose, namely Char, Space and Replicate.

Probably some of you are already familiar with the ASCII (American Standard Code for Information Interchange) character-encoding scheme (vs. binary encoding) used to encode files. ASCII code represents a mapping between number and characters, SQL Server supporting the transformation between two sets through the ASCII and Char functions. If ASCII translates a non-unicode character into an integer, the Char function translates an integer value into a non-unicode character. The integer values range between 0 and 255, they encompassing the 0-9 digits, the characters of English together with the diacritics of other important languages, punctuation signs and several graphical characters. The mapping between the two sets is unique, and as can be seen from the below example based on a common table expression, the functions are inverse:

-- ASCII character values 
WITH CTE 
AS (    
    SELECT 0 [Index]  
    UNION ALL 
    SELECT [Index]+1 [Index]  
    FROM CTE 
    WHERE [Index]<255 
) SELECT [Index] 
, CHAR([Index]) [Character] 
, ASCII(CHAR([Index])) [ASCII] 
FROM CTE 
OPTION (MAXRECURSION 0) 

There is not much to say about the Space and Replicate functions, the Space function returns a string of repeated spaces, while the Replicate function forms a string as a repeated sequence of values. The definition of the Space function could be considered as redundant as long the same output could be obtained by using the space as repeating sequence.

-- Space & Replicate at work 
SELECT Space(1) Example1 
, Space(5) Example2 
, Replicate(' ', 1) Example3 
, Replicate(' ', 5) Example4 
, Replicate('10', 1) Example5 
, Replicate('10', 2) Example6 
, Replicate('10', 3) Example7 
 
Replicate

As per the last statement, the first and third examples, respectively the second and fourth example will return the same values, unfortunately the output of the respective examples it’s not so easy to see. For exemplification, it could have been enriched by comparing or concatenating the strings, though that’s a topic for the next post. 

Notes:
The queries work also in SQL databases in Microsoft Fabric
 
Happy coding!

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

    The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

   A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

14 September 2007

💎SQL Reloaded: The Stuff function

No matter how much experience we have in a programming language or a technology, there is always something new to learn. There are a few hardly used functions in SQL Server 2000, but they could be really useful in certain situations. One of such functions is Stuff, I discovered its use long time after I started to play with SQL. 

  Stuff ( character_expression , start , length , character_expression )

--inserting a string inside another without doing a replacement
SELECT Stuff('This is just a test', 3, 0, 'x')  -- Output: Thxis is just a test 

--inserting a string inside another without doing a replacement 
SELECT Stuff('This is just a test', 3, 5, 'at was') --Output: That was just a test  

 So, it could be useful when we check whether a character is on a certain position, and replace it with another character. Normally we would have to write something like: 

DECLARE @string varchar(50) 
SET @string = 'DE1988299X8829' 
SELECT 
    CASE WHEN Substring(@string, 10,1) = 'X' THEN Stuff(@string, 10, 1, 'Y') 
   ELSE @string 
END 
Output: DE1988299Y8829     

Another function I haven't saw too often in SQL logic is Replicate, yeah, it does exactly what it's name suggests - it takes a string and replicates it's content multiple times. 

 Replicate ( character_expression , integer_expression ) 

 SELECT Replicate ('0', 10) --Output: 0000000000 
 SELECT Replicate ('tone', 3) --Output: tonetonetone     

The function could be useful when we need to put a number of characters in front of a value. For example a table contains integer values, but in a report we need them with leading zeroes (e.g. 00056 instead of 56). I tried to implement such functionality as a function, in a previous posting with the help of Space function; using Space and Replace functions can be obtained the same result as using Replicate

  SELECT Replace(Space(3), ' ', 'tone') --Output: tonetonetone

Notes:
The code has been tested successfully also on a SQL database in Microsoft Fabric.

Happy coding!

14 November 2005

💎SQL Reloaded: Out of Space

Some time ago I found a post in which somebody was annoyed of writing code like this:

DECLARE @zipcode char(5)

SET @zipcode = '90'

SELECT CASE len(@zipcode)
 WHEN 1 THEN '0000' + @zipcode
 WHEN 2 THEN '000' + @zipcode
 WHEN 3 THEN '00' + @zipcode
 WHEN 4 THEN '0' + @zipcode
 WHEN 5 THEN @zipcode
END

I remembered I met the same situation, but instead of writing a line of code for each case, I prefered to write a simple function, which used the Space and Replace built-in functions to add first a number of spaces and then replace them with a defined character. Maybe somebody will argue about the performance of such a function, but I prefer in many cases the reusability against performance, when the difference is not that big. Additionally, this enhances code’s readability.    

So, here is the function:

CREATE FUNCTION dbo.FillBefore(
   @target varchar(50)
 , @length int
 , @filler char(1))
/*
   Purpose: enlarges a string to a predefiend length by puting in front of it the same character
   Parameters: @target varchar(50) - the string to be transformed
  , @length int - the length of output string
  , @filler char(1) - the character which will be used as filler 
   Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL
*/
RETURNS varchar(50)
AS
BEGIN
   RETURN Replace(Space(@length-len(IsNull(@target, ''))), ' ', @filler) 
       + IsNull(@target, '')
END

-- testing the function
SELECT dbo.FillBefore('1234', 5, '0')
SELECT dbo.FillBefore('1234', 10, ' ')
SELECT dbo.FillBefore(1234, 10, ' ')
SELECT dbo.FillBefore(Cast(1234 as varchar(10)), 10, '0')

Another scenario in which the Space function was useful was when I had to save the result of a query to a text file in fixed width format. It resumes to the same logic used in FillBefore function, except the spaces must be added at the end of the string. For this I just needed to modify the order of concatenation and create a new function:

CREATE FUNCTION dbo.FillAfter(
   @target varchar(50)
 , @length int
 , @filler char(1))
/*
   Purpose: enlarges a string to a predefiend length by adding at the end the same character
   Parameters: @target varchar(50) - the string to be transformed
  , @length int - the length of output string
  , @filler char(1) - the character which will be used as filler 
   Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL

*/
RETURNS varchar(50)
AS
BEGIN
   RETURN IsNull(@target, '') 
        + Replace(Space(@length-len(IsNull(@target,''))), ' ', @filler)
END

  -- testing the function 
  SELECT dbo.FillAfter('1234', 5, '0')
  SELECT dbo.FillAfter('1234', 10, ' ')
  SELECT dbo.FillAfter(1234, 10, '0')
  SELECT dbo.FillAfter(Cast(1234 as varchar(10)), 10, '0')
  SELECT dbo.FillAfter(NULL, 10, '0')

Notes:
1. In SQL Server 2005, the output of a query can be saved using a DTS Package with fix format directly to a text file using ‘fixed width’ (the columns are defined by fix widths) or ‘ragged right’ (the columns are defined by fix widths, except the last one which is delimited by the new line character) format for the destination file.
2. To document or not to document. I know that many times we don't have the time to document the database objects we created, but I found that a simple comment saved more time later, even if was about my or other developer's time.
3. If is really needed to use the CASE function, is good do a analysis of the data and put as first evaluation the case with the highest probability to occur, as second evaluation the case with the second probability to occur, and so on. The reason for this is that each option is evaluated until the comparision operation evaluates to TRUE. For example, if the zip codes are made in most of the cases of 5 characters, then it makes sense to put it in the first WHEN position.
4) The code has been tested successfully also on a SQL database in Microsoft Fabric.

Happy coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.