SQL Troubles: Nulls

Showing posts with label Nulls. Show all posts

20 February 2025

💠🛠️🗒️SQL Server: Nulls [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 20-Feb-2024

[SQL Server] Null

{def} keyword that indicates that the value is unknown [1]

different from an empty or zero value [1]
no two null values are equal [1]

comparisons between two null values, or between a null value and any other value, return unknown because the value of each NULL is unknown [1]

indicates the the value is

unknown
not applicable
to be added later
⇒ can't be used as information that is required to distinguish one row in a table from another row in a table [1]

can be assigned to a value by

explicitly stating NULL in an INSERT or UPDATE statement [1[
leaving a column out of an INSERT statement [1]

{recommendation} test for null values in queries

via IS NULL or IS NOT NULL in the WHERE clause [1]
WHEN present in data, logical and comparison operators can potentially return a third result of UNKNOWN instead of just TRUE or FALSE [1]

⇐ three-valued logic can be the source for many application errors [1]

⇐ parameters and variables not explicitly initialized can cause problems in code

{recommendation} handle null values in logic

via IsNull or Coalesce functions

{constraint} [primary kyes] if any of the columns considered in a primary key contain NULL values, the PRIMARY KEY constraint can’t be created [3]
{constraint} [UNIQUE constraint] allows the columns that make up the constraint to allow NULLs, but it doesn’t allow all key columns to be NULL for more than one row [3]
[data warehouse] nullability of columns

{best practice} define columns as NOT NULL when appropriate

{benefit} helps the Query Optimizer
{benefit} reduces in some cases the storage space required for the data
{benefit} allows SQL Server to avoid unnecessary encoding in columnstore indexes and during batch mode execution [2]

{example} [SQL Server 2000+] bigint column

when the value is defined as NOT NULL , the value fits into a single CPU register

⇒ operations on the value can be performed more quickly

a nullable bigint column requires another, 65th bit to indicate NULL values

SQL Server avoids cross-register data storage by storing some of the row values (usually the highest or lowest values) in main memory using special markers to indicate it in the data that resides in the CPU cache [2]

⇒ adds extra load during execution

{recommendation} avoid nullable columns in data warehouse environments [2]

⇐ the recommendation can apply also to OLTP databases

there are database designs that enforces not null values for all attributes

e.g. Dynamics AX 2009/365 F&O
{benefit} eliminates the need to test for null values in legacy code

{recommendation} use CHECK and UNIQUE constraints or indexes when overhead introduced by constraints or unique indexes is acceptable [2]
{recommendation} consider using filtered indexes instead of normal indexes for columns with many null values

minimizes the waste of storage space
⇐ understand the characteristics of the columns used in the queries [3]

See also: Null-ifying the World, Dynamic Queries, Handling missing dates

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) SQL Server 2022: NULL and UNKNOWN (T-SQL)
[2] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

[3] Microsoft SQL Server 2012 Internals, by Kalen Delaney, Bob Beauchemin, Conor Cunningham, Jonathan Kehayias, Benjamin Nevarez & Paul S. Randal, Microsoft Press, ISBN: 978-0-7356-5856-1 , 2013

07 January 2011

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IV (Spaces, Trimming, Length and Comparisons)

In the previous post on concatenation, I was talking about the importance of spaces and other delimiters in making concatenations’ output more “readable”. Excepting their importance in natural language, the spaces have some further implication in the way strings are stored and processed. As remarked in the introductory post from this topic, there are two types of spaces that stand out in the crowds of spaces, namely the trailing spaces, the spaces found at the right extremity of a string, respectively the leading spaces, the spaces found at the left extremity of a string.

Are few the cases when the two trailing space are of any use, therefore databases like SQL Server usually ignore them. The philosophy about leading space is slightly different because there are cases in which they are used in order to align the text to the right, however there are tools which are cutting off the leading spaces. When no such tools are available or any of the two types of spaces are not cut off, then we’ll have do to it ourselves, and here we come to the first topic of this post, trimming.

Trimming

Trimming is the operation of removing the empty spaces found at the endings of a string. Unlike other programming languages which use only one function for this purpose (e.g. Trim function in VB or Oracle), SQL Server makes use of two functions used for this purpose, LTrim used to trim the spaces found at the left ending of the string, respectively RTrim, used to trim the spaces found at the right ending of the string.

-- trimming a string 
SELECT  LTrim(' this is a string ') Length1 -- left trimming 
, RTrim(' this is a string ') Length2 --right trimming 
, LTrim(RTrim(' this is a string ')) Length2 --left & right trimming

As can be seen it’s not so easy to identify the differences, maybe the next function will help to see that there is actually a difference.

Note:
1) If it looks like the two trimming functions are not working with strings having leading or trailing spaces, then maybe you are not dealing with an empty character but rather with other characters like CR, LF, CRLF or other similar characters, rendered sometimes like an empty character.

2) In SQL Server 2017 was introduced the Trim function which not only replaces the combined use of LTrim and RTrim functions, but it allows to replace other specified characters (including CR, LF, Tab) from the start or end of a string. (see post)

Length

Before approaching other operations with strings, it’s maybe useful (actually necessary as we will see) to get a glimpse of the way we can determine the length of a string value, in other words how many characters it has, this being possible by using the Len function:

-- length of a string 
SELECT Len('this is a string') Length1 -- simple string 
, Len('this is a string ') Length2 --ending in space 
, Len(' this is a string') Length3 --starting with a space 
, Len(' this is a string ') Length4 --starting &amp; ending with a space 
, Len(LTrim(' this is a string ')) Length5 --length & left trimming 
,Len(RTrim(' this is a string ')) Length5 --length & right trimming

,Len(LTrim(RTrim(' this is a string '))) Length5 --length, left & right trimming

In order to understand the above results, one observation is necessary: if a strings ends in with one or more empty characters, the Len function ignores them, though this doesn’t happen with the leading empty characters, they needing to be removed explicitly if needed.

Comparisons

The comparison operation points the differences or similarities existing between two data types, involving at minimum two expressions that reduce at runtime to a data type and the comparison operator. This means that each member of comparison could include any valid combinations of functions as long they are reduced to compatible data types. In what concerns the comparison of strings, things are relatively simple, the comparison being allowed independently on whether they have fix or varying length. Relatively simple because if we’d have to go into details, then we’d need to talk about character sets (also called character encoding or character maps) and other string goodies the ANSI SQL standard(s) are coming with, including a set of rules that dictate the behavior of comparisons. So, let’s keep things as simple as possible. As per above attempt of definition, a comparison implies typically an equality, respectively difference, based on equal (“=”), respectively not equal (“<>” or “!=”). Here are some simple examples:

-- sample comparisons 
SELECT CASE WHEN 'abc' != 'abc ' THEN 1 ELSE 0 END Example1 
, CASE WHEN ' abc' != 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN ' ' != '' THEN 1 ELSE 0 END Example3 
-- error comparison , CASE WHEN 'abc' != NULL THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' = NULL THEN 1 ELSE 0 END Example5 
-- adequate NULL comparison , CASE WHEN 'abc' IS NOT NULL THEN 1 ELSE 0 END Example6  
, CASE WHEN 'abc' IS NULL THEN 1 ELSE 0 END Example7

Output:

Example1	Example2	Example3	Example5	Example7
0	1	0	0	0

The first three examples are demonstrating again the behavior of leading, respectively trailing spaces. The next two examples, even if they seem quite logical in terms of natural language semantics, they are wrong from the point of view of SQL semantics, and this because the comparison of values in which one of them is NULL equates to a NULL, thus resulting the above behavior in which both expressions from the 4th and 5th example equate to false. The next two examples show how the NULLs should be handled in comparisons with the help of IS operator, respectively it’s negation – IS NOT.

Like in the case of numeric values, the comparison between two strings could be expressed by using the “less than” (“<;”) and “greater than” (“?”) operators, alone or in combination with the equality operator (“<=”, “>=”) or the negation operator (“!>”, “<!”) (see comparison operators in MDSN). Typically an SQL Server database is case insensitive, so there will be no difference between the following strings: “ABC”, “abc”, “Abc”, etc. Here are some examples:

-- sample comparisons (case sensitive) 
SELECT CASE WHEN 'abc' < 'ABC' THEN 1 ELSE 0 END Example1 
, CASE WHEN 'abc' > 'abc' THEN 1 ELSE 0 END Example2 
, CASE WHEN 'abc' >= 'abc ' THEN 1 ELSE 0 END Example3 
, CASE WHEN 'abc' <> 'ABC' THEN 1 ELSE 0 END Example4 
, CASE WHEN 'abc' > '' THEN 1 ELSE 0 END Example5 
, CASE WHEN ' ' > '' THEN 1 ELSE 0 END Example6

Output:

Example1	Example2	Example3	Example4	Example5	Example6
0	0	1	0	1	0

The case sensitivity could be changed at attribute, table or database level. As we don’t deal with a table and the don’t want to complicate too much the queries, let’s consider changing the sensitivity at database level. So if you are using a non-production database, try the following script in order to enable, respectively to disable the case sensitivity:

--enabling case sensitivity for a database 
ALTER DATABASE <database name>  
COLLATE Latin1_General_CS_AS  

--disabling case sensitivity for a database 
ALTER DATABASE <database name> 
COLLATE Latin1_General_CI_AS

In order to test the behavior of case sensitivity, enable first the sensitivity and then rerun the previous set of example (involving case sensitivity).
Output:

Example1	Example2	Example3	Example4	Example5	Example6
1	0	1	1	1	0

After that you could disable again the case sensitivity by running the last script. Please note that if your database has other collation, you’ll have to change the scripts accordingly in order to point to your database’s collation.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

03 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server V (Character Indexes)

A string is in fact a concatenation or chain of characters, each character in the string having a position referred as character’s index. In programming it’s useful to identify the first occurrence of a character or string in forward or backwards order, with a few simple tricks being possible to identify all the occurrences of the characters or whole strings in a string. Such functions are appearing in some of the well known programming languages under names like InStr, IndexOf for forward processing, respectively, InStrRev, LastIndexOf. SQL Server has only one function for this purpose, namely CharIndex, used to retrieve the position where a character is found starting from a given position. If no occurrence was found, the function returns 0, while if no start position if given, the default is 0. There are several other remarks that could be found in MSDN, and it’s actually always a good idea to consult the documentation from time to time, not only when meeting a new function. But let’s get back to the CharIndex function with a few examples based on searching a character within a given string:

-- searching a character into a string 
DECLARE @string varchar(50)  
SET @string = 'this is a string'  
SELECT CharIndex(' ', @string) Example1  
, CharIndex(NULL, @string) Example2 
, CharIndex(NULL, @string, NULL) Example3 
, CharIndex(NULL, NULL, NULL) Example4

Output:

Example1	Example2	Example3	Example4
5	NULL	NULL	NULL

If in the first example the function is used to retrieve the first occurrence of a space in the given string, in the last three examples is highlighted the behavior of the function in case one of the parameters is NULL, in such cases the function returning a NULL – this means that when other behavior is expected, it falls in your attributions to handle the NULL values accordingly.

Here’s a second set of examples based on the search of a string within another string. As can be seen there isn’t a difference at all in how the function is applied:

-- searching a string into a string
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_is_another_string'  
SET @search = 'is'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- occurring position  
, CharIndex(@search, @string, 5) Example3 -- second occurring position

Output:

Example1	Example2	Example3
this_is_another_string	3	6

From the three examples, the third deserves some special attention because it attempts to retrieve the second occurrence of the string by using a hardcoded starting value. In fact the search could start a position further from the first occurrence, something like in the below example:

-- searching a string into a string in foward/reverse order 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT @string Example1 -- simple string  
, CharIndex(@search, @string) Example2 -- 1st occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string)+1) Example3 -- 2nd occuring position (forward)  
, CharIndex(@search, @string, CharIndex(@search, @string, CharIndex(@search, @string)+1)+1) Example4 -- 3rd 
occuring position (forward)

Output:

Example1	Example2	Example3	Example4
this_was_a_string	5	9	11

The logic could be repeated over and over again in order to retrieve the n-th position, list all the positions of occurrence or the total number of occurrences.

Even if SQL Server doesn’t provide a function for retrieving the first occurrence in backward order, in other words starting from the end to the start, there is actually a simple trick that might do it. As SQL Server provides a Reverse function, which reverses the value of the string provided as parameter.

-- 1st occurring position backwards 
DECLARE @string varchar(50)  
DECLARE @search varchar(50)  
SET @string = 'this_was_a_string'  
SET @search = '_'  
SELECT Len(@string) - CharIndex(@search, Reverse(@string))+1 Example

Unfortunately the logic needed to retrieve the second and following occurrences starting from a given position it’s not so easy readable. Therefore, for the sake of “simplicity”, here’s the logic for reverse processing incorporated in CharIndexRev function:

-- reverse CharIndex 
CREATE FUNCTION dbo.CharIndexRev( 
@search varchar(50) 
, @string varchar(500) 
, @start int = 0) 
RETURNS int 
AS BEGIN 
RETURN Len(@string) - CharIndex(@search, Reverse(@string), CASE WHEN IsNull(@start, 0) != 0 THEN Len(@string) - @start+1 ELSE 0 END)+1 
END

And here’s the function at work:

-- searching a string into a string in reverse order 
DECLARE @string varchar(50) 
DECLARE @search varchar(50) 
SET @string = 'this_was_a_string' 
SET @search = '_' 
SELECT dbo.CharIndexRev(NULL, @string, 0) Example1 
, dbo.CharIndexRev(@search, NULL, 0) Example2 
, dbo.CharIndexRev(NULL, NULL, 0) Example3 
, dbo.CharIndexRev(@search, @string, 0) Example4 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1) Example5 
, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, dbo.CharIndexRev(@search, @string, 0)-1)-1) Example6

Output:

Example1	Example2	Example3	Example4	Example5	Example6
NULL	NULL	NULL	11	9	5

As can be seen, the output of the last three examples matches the reversed output of the previous set (11, 9, 5) vs. (5, 9, 11), while the first three examples exemplify the behavior of the function when one of the parameters is Null.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

Typically, a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation.

Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.

-- concatenation of strings 
SELECT 'a string' + ' and ' + 'a second string' Example1  
, 'a string' + ',' + 'a second string' Example2  
, '1245' + '67890' Example3  
, '1245' + '.' + '67890' Example4

The concatenation of string variables or columns functions based on the same principles:

-- concatenating string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 varchar(50) 
DECLARE @string3 varchar(50) 

SET @string1 = 'this is a string'  
SET @string2 = 'this is another string'  

SELECT @string1 + ' and ' + @string2 Example1 
, @string1 + char(31) + @string2 Example2 
, @string1 + ', ' + @string2 Example3 
, @string1 + ' ' + @string3 Example4 
, @string1 + IsNull(@string3, '!') Example5

Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:

-- concatenating columns of joined tables 
SELECT PAD.AddressID 
, IsNull(PAD.AddressLine1, '')  
+ IsNull(', ' + PAD.AddressLine2, '') 
+ IsNull(', ' + PAD.City, '') 
+ IsNull(', ' + PAD.PostalCode, '') 
+ IsNull(', ' + PSP.Name, '') Address 
FROM Person.Address PAD 
      JOIN Person.StateProvince PSP 
         ON PAD.StateProvinceID = PSP.StateProvinceID

As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:

-- concatenation within a loop 
DECLARE @list varchar(max) 
DECLARE @index int  
SET @list = '' 
SET @index = ASCII('a') WHILE (@index<ASCII('z')) 
BEGIN 
     SET @list = @list + ', ' + Char(@index) 
     SET @index = @index + 1 
END 
SELECT @list Result

There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:

-- concatenation across records 
;WITH Data(Column1, Ranking)  
AS  
( -- preparing test data 
       SELECT 'A' Column1, 0 Ranking  
       UNION ALL  
       SELECT 'B' Column1, 1 Ranking  
       UNION ALL  
       SELECT 'C' Column1, 2 Ranking 
) 
, Result(Column1, Ranking)  
AS 
(  -- performing the actual concatenation 
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking

      FROM Data

      WHERE Ranking = 0  
      UNION ALL 
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking  
      FROM Data A  
         JOIN Result  B  
             ON A.Ranking - 1 = B.Ranking  
) SELECT Column1  
FROM Result  
WHERE Ranking IN (SELECT MAX(Ranking) FROM Result)

The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

It doesn’t make sense to talk about the creation of string data types without talking about their “selection” as people typically want to see also examples at work, and not only learn about theoretical facts. I’m saying that because often programming seems to be like putting together the pieces of a puzzle without knowing how the final image would look like. Anyway, coming back to the topic, what I find great about SQL Server is that it allows to “create” and select a string with a minimum overhead within a simple select statement:

-- selecting a string value 
SELECT 'this is a string' -- string 1 
, 'and this is a second string' --string2 

-- selecting a string value (named columns)  
SELECT 'this is a string' String1  
, 'and this is a second string' String2

Pretty simple, isn’t it? This kind of scripts could be useful especially when debugging logic or testing an expression (e.g. functions at work), the advantage residing in the fact that is not necessary to built a table in order to test simple things. When performing multiple operations with the same values could be handy to store the values in declared variables:

-- selecting string variables 
DECLARE @string1 varchar(50)

DECLARE @string2 char(50) 
DECLARE @string3 char(50) = 'this is a string'  

SET @string1 = 'this is a string'  

SELECT @string1 String1 , @string2 String2 
, @string3 String3

As can be seen a not initialized variable has by default the value NULL, fact that could lead to unexpected behavior problems when used in expressions involving multiple variables. Therefore it’s recommended to initialize the values with the empty string or at least to handle the nulls in expressions. Starting with SQL Server 2008 it’s possible to declare and initialize variables within the same statement, so the above code won’t work in previous versions, unless the third declaration is broken into two pieces as per definition of first string.

It looks like that’s all, at least in what concerns the direct declaration of a string, be it implicit or explicit. However a basic introduction into strings’ creation is incomplete without mentioning the creation of strings from other data types and the various functions that could be used for this purpose. Ignoring the Cast and Convert functions used to explicitly convert the other data types to string data types, there are several other functions for this purpose, namely Char, Space and Replicate.

Probably some of you are already familiar with the ASCII (American Standard Code for Information Interchange) character-encoding scheme (vs. binary encoding) used to encode files. ASCII code represents a mapping between number and characters, SQL Server supporting the transformation between two sets through the ASCII and Char functions. If ASCII translates a non-unicode character into an integer, the Char function translates an integer value into a non-unicode character. The integer values range between 0 and 255, they encompassing the 0-9 digits, the characters of English together with the diacritics of other important languages, punctuation signs and several graphical characters. The mapping between the two sets is unique, and as can be seen from the below example based on a common table expression, the functions are inverse:

-- ASCII character values 
WITH CTE 
AS (    
    SELECT 0 [Index]  
    UNION ALL 
    SELECT [Index]+1 [Index]  
    FROM CTE 
    WHERE [Index]<255 
) SELECT [Index] 
, CHAR([Index]) [Character] 
, ASCII(CHAR([Index])) [ASCII] 
FROM CTE 
OPTION (MAXRECURSION 0)

There is not much to say about the Space and Replicate functions, the Space function returns a string of repeated spaces, while the Replicate function forms a string as a repeated sequence of values. The definition of the Space function could be considered as redundant as long the same output could be obtained by using the space as repeating sequence.

-- Space & Replicate at work 
SELECT Space(1) Example1 
, Space(5) Example2 
, Replicate(' ', 1) Example3 
, Replicate(' ', 5) Example4 
, Replicate('10', 1) Example5 
, Replicate('10', 2) Example6 
, Replicate('10', 3) Example7

As per the last statement, the first and third examples, respectively the second and fourth example will return the same values, unfortunately the output of the respective examples it’s not so easy to see. For exemplification, it could have been enriched by comparing or concatenating the strings, though that’s a topic for the next post.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

16 March 2010

🔏MS Office: Excel for SQL Developers IV (Differences Between Two Datasets)

One of the tasks which appears from time to time on my table is to determine the differences between two datasets coming from different systems or from the same system/source but taken at different points in time. Even if it’s fairly simple to write such a query, especially when the number of attributes is quite small, there is lot of repetitive work that needs to be done because typically for each set of matched attributes needs to be added a third attribute (difference flag) showing whether there is a difference or not. In time I started to use Excel formulas to create the SELECT clause for such a query, matching tables’ attribute first and categorizing the data types mainly in 4 categories: text, amounts, numeric (other than amounts) and dates. This categorization is requested first of all by the different ways in handling the NULL values for each data type, and secondly by the difference between amounts and numeric, even if they might have the same data type, the difference resides in the fact that amounts might be rounded differently in each data source, therefore a difference of a few cents will not be considered as difference.

Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:

--text attributes: 
CASE 
    WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--amount attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--numeric attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--date attributes: 
CASE 
   WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y' 
   ELSE 'N' 
END DiffColumnXYFlag

Notes:
1.   Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
    WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
    ELSE 'N'
END DiffColumnXYFlag
2.   In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3.    In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4.   It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.

    Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found.

Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String 
'builds the code for a comparison query between two tables 
Dim attributes As String 
Dim differences As String 
Dim constraint As String 
Dim whereConstraints As String 
Dim joinConstraints As String 
Dim columnX As String 
Dim columnY As String 
Dim index As Integer 
For index = 1 To rng.Rows.Count 
    columnX = Trim(rng.Cells(index, 1).Value) 
    columnY = Trim(rng.Cells(index, 2).Value) 
  
    If Len(columnX) > 0 Or Len(columnY) > 0 Then 
       If Len(columnX) > 0 Then 
            attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf 
       End If 
       If Len(columnY) > 0 Then 
            attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf 
       End If 
       
       constraint = "" 
       If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then 
            If Len(columnX) > 0 And Len(columnY) > 0 Then 
                 'creating the difference flag 
                 Select Case Trim(rng.Cells(index, 3).Value) 
                 Case "text": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "amount": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "numeric": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "date": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case Else: 'error 
                     MsgBox "Incorrect data type provided for " & index & " row!", vbCritical 
                 End Select 
                 
                
                If ShowOnlyDifferencesFlag Then 
                   whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf 
                End If 
                
                differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf 
            Else 
                differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf 
            End If 
                             
        Else 
            joinConstraints = joinConstraints & "    AND A." & columnX & " = B." & columnY & vbCrLf 
        End If 
     
     End If 
Next 
If Len(attributes) > 0 Then 
    attributes = Right(attributes, Len(attributes) - 2) 
End If 
If Len(joinConstraints) > 0 Then 
    joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8) 
End If 
If Len(whereConstraints) > 0 Then 
    whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4) 
End If 
'building the comparison query 
GetComparisonQuery = "SELECT " & attributes & _ 
    differences & _ 
    "FROM " & LeftTable & " A" & vbCrLf & _ 
    "     FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _ 
    "       ON " & joinConstraints & _ 
    IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "") 
   
End Function

The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in Saving Data With Stored Procedures post:

SELECT A.ProductID LTProductID 
, B.ProductID RTProductID 
, A.Name LTName 
, B.Name RTName 
, A.ProductNumber LTProductNumber 
, B.ProductNumber RTProductNumber 
, A.MakeFlag LTMakeFlag 
, B.MakeFlag RTMakeFlag 
, A.FinishedGoodsFlag LTFinishedGoodsFlag 
, B.FinishedGoodsFlag RTFinishedGoodsFlag 
, A.Color LTColor 
, B.Color RTColor 
, A.SafetyStockLevel LTSafetyStockLevel 
, B.SafetyStockLevel RTSafetyStockLevel 
, A.ReorderPoint LTReorderPoint 
, A.StandardCost LTStandardCost 
, B.StandardCost RTStandardCost 
, A.ListPrice LTListPrice 
, B.ListPrice RTListPrice 
, A.Size LTSize 
, A.DaysToManufacture LTDaysToManufacture 
, A.ProductLine LTProductLine 
, A.Class LTClass 
, A.Style LTStyle 
, A.SellStartDate LTSellStartDate 
, B.StartDate RTSellStartDate 
, A.SellEndDate LTSellEndDate 
, B.EndDate RTSellEndDate 
, A.DiscontinuedDate LTDiscontinuedDate 
, B.CreationDate RT 
, B.CreatedBy RT 
, A.ModifiedDate LTModifiedDate 
, B.ModifiedDate RTModifiedDate 
, B.ModifiedBy RT 
, CASE 
    WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
    ELSE 'N' 
END DiffNameFlag 
, CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
    ELSE 'N' 
END DiffProductNumberFlag 
, CASE 
    WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffMakeFlagFlag 
, CASE 
    WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffFinishedGoodsFlagFlag 
, CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
    ELSE 'N' 
END DiffColorFlag 
, CASE 
    WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
    ELSE 'N' 
END DiffSafetyStockLevelFlag 
, 'n/a' DiffReorderPointFlag 
, CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05       
     THEN 'Y' 
ELSE 'N' 
END DiffStandardCostFlag 
, CASE 
     WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
     ELSE 'N' 
END DiffListPriceFlag 
, 'n/a' DiffSizeFlag 
, 'n/a' DiffDaysToManufactureFlag 
, 'n/a' DiffProductLineFlag 
, 'n/a' DiffClassFlag 
, 'n/a' DiffStyleFlag 
, CASE 
    WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellStartDateFlag 
, CASE 
    WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellEndDateFlag 
, 'n/a' DiffDiscontinuedDateFlag 
, 'n/a' DiffCreationDateFlag 
, 'n/a' DiffCreatedByFlag 
, CASE 
     WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
     ELSE 'N' 
END DiffModifiedDateFlag 
, 'n/a' DiffModifiedByFlag 
FROM Production.Product A 
    FULL OUTER JOIN Production.Products B 
       ON A.ProductID = B.ProductID 
WHERE CASE 
     WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 
      THEN 'Y' 
ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
       WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y'

Notes:
1.   The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2.   Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3.    Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4.     Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5.     In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false)
6.     For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7.     Please note that no validation is made for the input parameters.

SQL Troubles

Pages

20 February 2025

💠🛠️🗒️SQL Server: Nulls [Notes]

07 January 2011

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IV (Spaces, Trimming, Length and Comparisons)

03 January 2011

💎SQL Reloaded: Pulling the Strings of SQL Server V (Character Indexes)

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

16 March 2010

🔏MS Office: Excel for SQL Developers IV (Differences Between Two Datasets)

About Me