07 January 2011

Pulling the Strings of SQL Server–Part IV: Spaces, Trimming, Length and Comparisons

    In the previous post on concatenation, I was talking about the importance of spaces and other delimiters in making concatenations’ output more “readable”. Excepting their importance in natural language, the spaces have some further implication in the way strings are stored and processed. As remarked in the introductory post from this topic, there are two types of spaces that stand out in the crowds of spaces, namely the trailing spaces, the spaces found at the right extremity of a string,  respectively the leading spaces, the spaces found at the left extremity of a string. Are few the cases when the two trailing space are of any use, therefore databases like SQL Server usually ignore them. The philosophy about leading space is slightly different because there are cases in which they are used in order to align the text to the right, however there are tools which are cutting off the leading spaces. When no such tools are available or any of the two types of spaces are not cut off, then we’ll have do to it ourselves, and here we come to the first topic of this post, trimming.

Trimming

   Trimming is the operation of removing the empty spaces found at the endings of a string. Unlike other programming languages which use only one function for this purpose (e.g. Trim function in VB or Oracle), SQL Server makes use of two functions used for this purpose, LTrim used to trim the spaces found at the left ending of the string, respectively RTrim, used to trim the spaces found at the right ending of the string.
-- trimming a string
SELECT  LTrim(' this is a string ') Length1 -- left trimming
, RTrim(' this is a string ') Length2 --right trimming
, LTrim(RTrim(' this is a string ')) Length2 --left & right trimming
    As can be seen it’s not so easy to identify the differences, maybe the next function will help to see that there is actually a difference.

Note:
    If it looks like the two trimming functions are not working with strings having leading or trailing spaces, then maybe you are not dealing with an empty character but rather with other characters like CR, LF, CRLF or other similar characters, rendered sometimes like an empty character.

Length

   Before approaching other operations with strings, it’s maybe useful (actually necessary as we will see) to get a glimpse of the way we can determine the length of a string value, in other words how many characters it has, this being possible by using the Len function:
-- length of a string
SELECT Len('this is a string') Length1 -- simple string
, Len('this is a string ') Length2 --ending in space
, Len(' this is a string') Length3 --starting with a space
, Len(' this is a string ') Length4 --starting & ending with a space
, Len(LTrim(' this is a string ')) Length5 –length & left trimming
,Len(RTrim(' this is a string ')) Length5 –length & right trimming
,Len(LTrim(RTrim(' this is a string '))) Length5 –length, left & right trimming
    In order to understand the above results, one observation is necessary: if a strings ends in with one or more empty characters, the Len function ignores them, though this doesn’t happen with the leading empty characters, they needing to be removed explicitly if needed.

Comparisons

    The comparison operation points the differences or similarities existing between two data types, involving at minimum two expressions that reduce at runtime to a data type and the comparison operator. This means that each member of comparison could include any valid combinations of functions as long they are reduced to compatible data types. In what concerns the comparison of strings, things are relatively simple, the comparison being allowed  independently on whether they have fix or varying length. Relatively simple because if we’d have to go into details, then we’d need to talk about character sets (also called character encoding or character maps) and other string goodies the ANSI SQL standard(s) are coming with, including a set of rules that dictate the behavior of comparisons. So, let’s keep things as simple as possible. As per above attempt of definition, a comparison implies typically an equality, respectively difference, based on equal (“=”), respectively not equal (“<>” or “!=”). Here are some simple examples:
-- sample comparisons
SELECT CASE WHEN 'abc' != 'abc ' THEN 1 ELSE 0 END Example1
,
CASE WHEN ' abc' != 'abc' THEN 1 ELSE 0 END Example2

,
CASE WHEN ' ' != '' THEN 1 ELSE 0 END Example3

-- error comparison
, CASE WHEN 'abc' != NULL THEN 1 ELSE 0 END Example4
,
CASE WHEN 'abc' = NULL THEN 1 ELSE 0 END Example5

-- adequate NULL comparison
, CASE WHEN 'abc' IS NOT NULL THEN 1 ELSE 0 END Example6
,
CASE WHEN 'abc' IS NULL THEN 1 ELSE 0 END Example7

comparisons - set 1
   The first three examples are demonstrating again the behavior of leading, respectively trailing spaces. The next two examples, even if they seem quite logical in terms of natural language semantics, they are wrong from the point of view of SQL semantics, and this because the comparison of values in which one of them is NULL equates to a NULL, thus resulting the above behavior in which both expressions from the 4th and 5th example equate to false. The next two examples show how the NULLs should be handled in comparisons with the help of IS operator, respectively it’s negation – IS NOT. 

   Like in the case of numeric values, the comparison between two strings could be expressed by using the “less than” (“<”) and “greater than” (“?”) operators, alone or in combination with the equality operator (“<=”, “>=”) or the negation operator (“!>”, “<!”) (see comparison operators in MDSN). Typically an SQL Server database is case insensitive, so there  will be no difference between the following strings: “ABC”, “abc”, “Abc”, etc. Here are some examples:
-- sample comparisons (case sensitive)
SELECT CASE WHEN 'abc' < 'ABC' THEN 1 ELSE 0 END Example1
,
CASE WHEN 'abc' > 'abc' THEN 1 ELSE 0 END Example2

,
CASE WHEN 'abc' >= 'abc ' THEN 1 ELSE 0 END Example3

,
CASE WHEN 'abc' <> 'ABC' THEN 1 ELSE 0 END Example4

,
CASE WHEN 'abc' > '' THEN 1 ELSE 0 END Example5

,
CASE WHEN ' ' > '' THEN 1 ELSE 0 END Example6

comparisons - case insensitiveness
    The case sensitivity could be changed at attribute, table or database level. As we don’t deal with a table and the don’t want to complicate too much the queries, let’s consider changing the sensitivity at database level. So if you are using a non-production database, try the following script in order to enable the case sensitivity:
--enabling case sensitivity for a database
ALTER DATABASE <your database>
COLLATE Latin1_General_CS_AS
, and the following script to disable it:
--disabling case sensitivity for a database
ALTER DATABASE <your database>
COLLATE Latin1_General_CI_AS
    In order to test the behavior of case sensitivity, enable first the sensitivity and then rerun the previous set of example (involving case sensitivity).
comparisons - case sensitiveness
    After that you could disable again the case sensitivity by running the last script. Please note that if your database has other collation, you’ll have to change the scripts accordingly in order to point to your database’s collation.

No comments: